Skip to content

Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89

Open
tats-u wants to merge 1 commit into
unicode-rs:masterfrom
tats-u:fix-half-voiced
Open

Fix halfwidth katakana voiced and semi-voiced sound marks in width calculations#89
tats-u wants to merge 1 commit into
unicode-rs:masterfrom
tats-u:fix-half-voiced

Conversation

@tats-u
Copy link
Copy Markdown

@tats-u tats-u commented May 26, 2026

Related: microsoft/terminal#18087

Halfwidth katakana (semi-)voiced sound marks U+FF9E & U+FF9F are the only Grapheme Extenders that belong to Letter (Lm). They are typical edge cases when you cosider grapheme clusters.

They should be counted as 1 (their EAW is H), not 0.


English transtion of initial Prompt (Copilot + GPT-5.4 high):

Add test_halfwidth_katakana:
パグ→4 characters


English translation of additional handling prompt after discovering that the added test fails (because GPT started flailing around trying to fix it):

The half-width Kana voiced mark test failed because they're Grapheme Extenders despite being Letters. The current spec sucks, so please fix it together


"pug dog" was hand-written. The rest part of that comment was completed by Copilot line completion.

Comment thread src/lib.rs
Comment on lines +101 to +102
//! with the [`Grapheme_Extend`] property, except [`'\u{FF9E}'` HALFWIDTH KATAKANA VOICED SOUND MARK](https://util.unicode.org/UnicodeJsps/character.jsp?a=FF9E)
//! and [`'\u{FF9F}'` HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK](https://util.unicode.org/UnicodeJsps/character.jsp?a=FF9F).
Copy link
Copy Markdown
Contributor

@Jules-Bertholet Jules-Bertholet May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this a carve-out in rule 4, I think these should be added to rule 1 above. In other word, replace "U+2D7F has width 1" above with "U+2D7F, U+FF9E, and U+FF9F have width 1".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants