perf: Avoid re-canonicalizing the entire IntervalSet on push by Marwes · Pull Request #1308 · rust-lang/regex

Marwes · 2025-10-20T14:49:11Z

Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by lalrpop's lexer expanding multiple \w in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem.

I did see the earlier attempt at this #1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).

Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by [lalrpop](https://github.com/lalrpop/lalrpop)'s lexer expanding multiple `\w` in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem. I did see the earlier attempt at this rust-lang#1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).

Marwes · 2026-05-21T11:34:36Z

I have done some optimization around the code where this occurs which brings up canonicalize to about 5% in our benchmarks, any chance this could be merged or what is necessary to add to this to get it over the line?

BurntSushi

Hi! Yes, thank you and apologies for the late reply.

The change here itself looks good to me. Thank you for working through this subtle optimization. :-) I have only minor nits on this PR and a request.

Could you add a benchmark to rebar? I think probably this directory is the place to add it. And if you could show a before-and-after on the rebar results here, that would be great.

Just noticed the comment while doing rust-lang#1308 but I don't have any actual case where this shows up as performance problem. I would argue the code is simpler though so perhaps just simpler code and removal of a "TODO" comment is good.

Just noticed the comment while doing rust-lang#1308 but I don't have any actual case where this shows up as performance problem. Arguably the code is simpler though so perhaps that and the removal of a "TODO" comment is good.

Marwes force-pushed the interval_set_fast_push branch from b3ea20f to 3d44573 Compare May 12, 2026 15:08

BurntSushi requested changes May 21, 2026

View reviewed changes

Comment thread regex-syntax/src/hir/interval.rs Outdated

Comment thread regex-syntax/src/hir/interval.rs Outdated

Comment thread regex-syntax/src/hir/interval.rs Outdated

refactor: Review comments

65842e8

Marwes mentioned this pull request May 21, 2026

feat: Add compile benchmark for regex#1308 BurntSushi/rebar#32

Open

Marwes mentioned this pull request May 26, 2026

perf: Run canonicalize in-place #1352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Avoid re-canonicalizing the entire IntervalSet on push#1308

perf: Avoid re-canonicalizing the entire IntervalSet on push#1308
Marwes wants to merge 2 commits into
rust-lang:masterfrom
Marwes:interval_set_fast_push

Marwes commented Oct 20, 2025

Uh oh!

Marwes commented May 21, 2026

Uh oh!

BurntSushi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marwes commented Oct 20, 2025

Uh oh!

Marwes commented May 21, 2026

Uh oh!

BurntSushi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants