AtlasEngine: Enable RTL text shaping via bidi analysis (#20156)#20159
AtlasEngine: Enable RTL text shaping via bidi analysis (#20156)#20159Qmoony wants to merge 2 commits into
Conversation
Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were called with isRightToLeft=0 for all text, which prevented DirectWrite from performing Arabic contextual shaping (cursive joining) and produced glyphs in LTR order for RTL text. This change: - Calls AnalyzeBidi() to determine text direction per run - Passes the correct isRightToLeft value to GetGlyphs/GetGlyphPlacements - Reverses the resulting RTL glyph ordering to LTR for the backends - Builds ShapedRow directly for RTL runs, bypassing the column loop which assumes ascending clusterMap values
|
@microsoft-github-policy-service agree |
|
Do you have a screenshot of this working? |
| { | ||
| // Determine if this script run is RTL by intersecting with bidi runs. | ||
| // If all overlapping bidi runs agree on direction → use that. | ||
| // If mixed (RTL+LTR within the same script run) → fall back to LTR. |
There was a problem hiding this comment.
I believe the technically correct approach would be to refactor the text segmentation algorithm. This code currently looks somewhat like this:
for (each text attribute) { // aka: UpdateDrawingBrushes
for (each font fallback) { // aka: _flushBufferLine
for (each script run) {
for (each glyph) {
place glyph
}
}
}
}The correct approach should look somewhat like this:
// aka: UpdateDrawingBrushes, etc., calls
accumulate all text and attributes in a line
// ...then at the end of each line during flush()
LinkedList runs;
for (each text attribute) {
for (each font fallback) {
split runs (per font)
}
}
perform glyph analysis
for (each analysis result) {
split runs (per script)
}
// (because DWrite can split runs by accident)
merge adjacent identical runs again
for (each run) {
map to glyphs
foreach (glyph) {
place glyph
}
}Among others, this solves the issue that your code only performs correct RTL if it is contained within an entire span. Making a part of RTL text bold for instance should break your code.
|
|
||
| // RTL run fully handled. Skip the normal column loop below. | ||
| continue; | ||
| } |
There was a problem hiding this comment.
The block above is some nice reference code to have! Thank you for working on it!
We should probably think about unifying it with the existing logic, however.
BTW judging by the lack of {} around conditions, I bet it was done by Claude though, right? 😄
There was a problem hiding this comment.
Thanks for the detailed review and the architecture outline!
You're right on both counts:
- The current approach only handles RTL correctly within a single script run — as soon as a space or attribute change splits the span, it falls back to LTR. That matches what @Avid29 observed with
זאת בדיקה. - And yes, Claude wrote it :) The missing braces gave it away? I did review and
test before pushing though.
Your proposed refactoring makes sense — accumulate line state first, then
split/merge runs, then glyph placement last. That would also fix the mixed-direction
span case naturally.
I see two paths forward:
- Option A: Keep this PR as a working proof-of-concept / stepping stone, and
iterate on the full refactor in a follow-up. - Option B: Close this and do the full refactor from scratch, using the RTL
reversal code here as reference.
I'm leaning toward Option A since it already improves the common case (pure RTL
text without mixed formatting), but I'll defer to your preference on this.
Happy to take this to a discussion issue first if you'd rather align on the design
up front.
There was a problem hiding this comment.
Claude is bad at implicit research, e.g. checking for existing code styles, if you don't tell it to. That's perhaps also why today's system prompts are like 10000 lines long. "Alignment" my 🍑. lol
Jokes aside, I'd prefer option B, because we (the maintainers) are always pressed on time. If we don't do it right now, we won't ever have any incentive to fix it later. So, if we care to make it good, we have to make it good from the get go.
There was a problem hiding this comment.
Went with B — pushed 13f06b9. The pipeline now does analysis → split runs:
AnalyzeScript+AnalyzeBidirun once on the whole buffer line in_flushBufferLine_mapRegularTextclamps font fallback andGetTextComplexityat script/bidi boundaries via a new_nextAnalysisBoundaryhelper, so every range entering_mapComplexis guaranteed uniform in direction and script_mapComplexdropped its inlineAnalyzeBidi/AnalyzeScriptcalls, the per-script-run loop, and the "mixed → LTR" fallback (structurally impossible now)
Verified locally:
זאת בדיקה(the Hebrew + space case @Avid29 reported)Hello זאת/abc العربية xyz(mixed LTR+RTL on one line)بسم الله الرحمن الرحيم(multi-word Arabic, per-word cursive joining intact)
No visible regressions in ASCII, CJK, emoji, combining marks, ligatures, or built-in box-drawing glyphs.
Still out of scope:
- Full line-level visual reordering for mixed-direction text — the terminal cell grid is LTR; separate change
- Cursive joining across direction/font boundaries — inherent limitation when text gets split into separate shaping runs
|
Yeah this is one example for what I meant in my review above. It's somewhat annoying to fix this because it necessitates a refactor. The flip-side is that this fixes various non-RTL issues too. |
|
@Avid29 Good catch, thanks for testing! The space character has bidi class WS |
|
@DHowett Screenshot added below. For a quick test case, |
This comment has been minimized.
This comment has been minimized.
29044b4 to
ff22ca5
Compare
This comment has been minimized.
This comment has been minimized.
a89d424 to
ff22ca5
Compare
The previous RTL implementation ran AnalyzeBidi and AnalyzeScript inside _mapComplex, after font fallback and complexity analysis had already segmented the text. When those upstream segmenters split a direction- uniform region (e.g. at a space or attribute change), each fragment re-analyzed in isolation and fell back to LTR. This commit inverts the order so analysis is authoritative: - Run AnalyzeScript + AnalyzeBidi once on the whole buffer line in _flushBufferLine. - Add _nextAnalysisBoundary so font fallback and GetTextComplexity clamp every segment at the next script/bidi run end. - _mapComplex receives ranges guaranteed uniform in script and direction; the inline analysis call, the per-script-run loop, and the "mixed -> LTR" fallback are removed. Hebrew text with embedded spaces and mixed LTR/RTL lines now shape correctly.
13f06b9 to
25aa9c7
Compare


Summary of the Pull Request
This fixes Persian/Arabic text rendering in AtlasEngine by enabling DirectWrite's bidirectional (bidi) text analysis and passing the correct isRightToLeft flag to GetGlyphs() and GetGlyphPlacements(). Previously these were hardcoded to 0 (LTR), which prevented Arabic contextual shaping and produced glyphs in reversed order.
References and Relevant Issues
Closes #20156
Detailed Description of the Pull Request / Additional comments
Previously, IDWriteTextAnalyzer::GetGlyphs and GetGlyphPlacements were called with isRightToLeft=0 for all text, which prevented DirectWrite from performing Arabic contextual shaping (cursive joining) and produced glyphs in LTR order for RTL text.
This change:
Validation Steps Performed
PR Checklist