fix unicode content parsing#415
Conversation
fixed exception in parsing symbols like em dash or cyrillic letters. Tests are included
|
Your test compiles fine without applying the patch. |
|
Sorry, I forgot explain when it happens. then you have exception on "parse error: failed at `content: '—';". |
|
The mbstring.func_overload ini setting can't be changed at run-time; also, I've run into systems where the mbstring extension isn't enabled. How about prefixing strlen() and substr() calls with '_', and then adding these helper functions? |
|
The language itself doesn't use unicode so the parser doesn't need to be aware of it (no unicode keywords or anything). The buffer can be treated as a string of 8 bit chars and still have unicode characters pass through it fine. I'm guessing the problem is that mbstring.func_overload causes the string functions to work differently than the $buffer[x] character accessor which causes a mismatch and the parser to fail? Does changing the ini setting with ini_set just for the parse work? If that doesn't work then another option is to replace all instances of $this->buffer[] with substr($this->buffer, ...). There's probably a noticeable performance hit to this change though. |
Yes, you are right.. Changing compile method on this helps and parsing is going well even if |
Parse error appears only if you had set mbstring.internal_encoding UTF-8
|
Thanks for fix. |
fixed exception in parsing symbols like em dash or cyrillic letters.
Tests are included