Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89
Open
tats-u wants to merge 1 commit into
Open
Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89tats-u wants to merge 1 commit into
tats-u wants to merge 1 commit into
Conversation
Manishearth
approved these changes
May 26, 2026
Comment on lines
+101
to
+102
| //! with the [`Grapheme_Extend`] property, except [`'\u{FF9E}'` HALFWIDTH KATAKANA VOICED SOUND MARK](https://util.unicode.org/UnicodeJsps/character.jsp?a=FF9E) | ||
| //! and [`'\u{FF9F}'` HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK](https://util.unicode.org/UnicodeJsps/character.jsp?a=FF9F). |
Contributor
There was a problem hiding this comment.
Instead of making this a carve-out in rule 4, I think these should be added to rule 1 above. In other word, replace "U+2D7F has width 1" above with "U+2D7F, U+FF9E, and U+FF9F have width 1".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related: microsoft/terminal#18087
Halfwidth katakana (semi-)voiced sound marks U+FF9E & U+FF9F are the only Grapheme Extenders that belong to Letter (Lm). They are typical edge cases when you cosider grapheme clusters.
They should be counted as 1 (their EAW is H), not 0.
English transtion of initial Prompt (Copilot + GPT-5.4 high):
Add
test_halfwidth_katakana:パグ→4 characters
English translation of additional handling prompt after discovering that the added test fails (because GPT started flailing around trying to fix it):
The half-width Kana voiced mark test failed because they're Grapheme Extenders despite being Letters. The current spec sucks, so please fix it together
"pug dog" was hand-written. The rest part of that comment was completed by Copilot line completion.