Skip to content

meta: fix non-leftmost match in reverse-suffix/inner optimizations#1346

Open
stefanobaghino wants to merge 1 commit into
rust-lang:masterfrom
stefanobaghino:fix/meta-leftmost-1345
Open

meta: fix non-leftmost match in reverse-suffix/inner optimizations#1346
stefanobaghino wants to merge 1 commit into
rust-lang:masterfrom
stefanobaghino:fix/meta-leftmost-1345

Conversation

@stefanobaghino
Copy link
Copy Markdown

Closes #1345.

ReverseSuffix::try_search_half_start and ReverseInner::try_search_full returned on the first successful reverse search, which assumed that the leftmost occurrence of the suffix (resp. inner) literal corresponds to the end of the leftmost-first match. That assumption breaks when the regex prefix is non-monotonic — e.g. an optional group whose body can absorb the literal between consecutive occurrences — so a strictly later literal occurrence may have a strictly earlier overall match start. The minimal repro filed in #1345 is [^()]*(?:\([^()]*\))?[^()]*: on $(:):: META was returning 2..3, while the leftmost-first match is 0..5 (the optional group absorbs (:) and pins to the second :).

Both loops now track the best (smallest-start) candidate and bound each subsequent reverse search by best.offset(). The existing limited reverse DFA in regex-automata/src/meta/limited.rs returns RetryError::Quadratic the moment scanning past the bound is required, which propagates through ? and triggers the existing Core fallback in Strategy::search. The loops also early-exit when the candidate already starts at input.start(), so the common (monotonic-prefix) cases stay one-shot.

The bug was reduced from real-world .sublime-syntax grammars driven through fancy-regex's seek-prefilter approximation; see fancy-regex/fancy-regex#249 for the downstream context.

testdata/regression.toml gets the minimal repro as non-monotonic-reverse-suffix; it failed against meta and passed against pikevm/hybrid before this change.

The reverse-suffix and reverse-inner meta strategies returned on the first successful reverse search, which assumed that the leftmost suffix (resp. inner literal) occurrence corresponds to the leftmost-first match. That assumption fails when the regex prefix is non-monotonic — e.g. an optional group that can absorb the suffix character between consecutive occurrences — so a strictly later literal occurrence may have a strictly earlier overall match start.

Both loops now track the best (smallest-start) candidate and bound each subsequent reverse search by `best.offset()`. The existing limited reverse DFA returns `Quadratic` the moment scanning past that bound is needed, which propagates to a `Core` fallback. An early-exit triggers as soon as the candidate already starts at `input.start()`.

Adds a regression test covering the minimal repro from rust-lang#1345 (`[^()]*(?:\([^()]*\))?[^()]*:` on `$(:):`).

Closes rust-lang#1345.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

meta::Regex returns non-leftmost match for pattern with negated class around optional group

1 participant