Skip to content

Strip Unicode Cf characters in PrintableString#4593

Open
tnull wants to merge 1 commit intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi
Open

Strip Unicode Cf characters in PrintableString#4593
tnull wants to merge 1 commit intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi

Conversation

@tnull
Copy link
Copy Markdown
Contributor

@tnull tnull commented May 5, 2026

PrintableString is the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text, UntrustedString, LSPS messages, lightning-invoice descriptions) to logs and UI. It only replaced char::is_control matches (Unicode general category Cc) with U+FFFD, leaving the entire Cf (Format) category untouched.

That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g. safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader sees safeexe.cips — defeating the threat model PrintableString exists to defend against.

Replace Cf codepoints alongside Cc ones. The Cf ranges are inlined as a matches! table sourced from Unicode 16.0 to keep the change no_std-friendly with no new dependencies.

Co-Authored-By: HAL 9000

`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.

That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.

Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.

Co-Authored-By: HAL 9000
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

I've assigned @valentinewallace as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

| 0x110BD
| 0x110CD
| 0x13430..=0x13440
| 0x1BCA0..=0x1BCA3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Off-by-one — U+13440 EGYPTIAN HIEROGLYPH MIRROR HORIZONTALLY is general category Mn (Mark, Nonspacing) in Unicode 16.0, not Cf. The correct upper bound for the Egyptian Hieroglyph Format Controls Cf range is U+1343F.

This causes a valid combining mark to be incorrectly replaced with U+FFFD.

Suggested change
| 0x1BCA0..=0x1BCA3
| 0x13430..=0x1343F

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

Review Summary

Inline comments posted

  • lightning-types/src/string.rs:71 — Off-by-one in Egyptian Hieroglyph Cf range: 0x13430..=0x13440 should be 0x13430..=0x1343F (U+13440 is Mn, not Cf in Unicode 16.0).

Minor observations (not bugs, no inline comment needed)

  • The PrintableString struct docstring (line 25-26) still says "replacing control characters" but now also replaces format characters. Consider updating to mention both categories.

use core::fmt::Write;
for c in self.0.chars() {
let c = if c.is_control() { core::char::REPLACEMENT_CHARACTER } else { c };
let c = if c.is_control() || is_format_char(c) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it make sense to simplify this to just use the replacement character if !is_alphanumeric && !is_whitespace, or similar?

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.24%. Comparing base (1a26867) to head (c1c66e7).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
lightning-types/src/string.rs 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4593      +/-   ##
==========================================
- Coverage   86.84%   86.24%   -0.60%     
==========================================
  Files         161      159       -2     
  Lines      109260   109192      -68     
  Branches   109260   109192      -68     
==========================================
- Hits        94882    94170     -712     
- Misses      11797    12411     +614     
- Partials     2581     2611      +30     
Flag Coverage Δ
fuzzing-fake-hashes ?
fuzzing-real-hashes ?
tests 86.24% <95.65%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants