Fail fast on invalid config regexes and enabled config#461
Fail fast on invalid config regexes and enabled config#461
Conversation
ChristianPavilonis
left a comment
There was a problem hiding this comment.
PR Review: "Fail fast on invalid config regexes and enabled config"
Verdict: Looks good — no blockers.
Findings
P2 — Medium (Informational)
-
RscUrlRewriternow uses a generic regex (crates/common/src/integrations/nextjs/shared.rs:22-27) — The static regex matches all URLs instead of per-origin URLs, so every URL in an RSC payload goes through the closure. The existing early-exit check (input.contains(&self.origin_host)at line 139) short-circuits the common case, so performance impact is negligible in practice. No action needed. -
Prebid
server_urlvalidation behavioral change (crates/common/src/integrations/prebid.rs:210-219) — Emptyserver_urlnow causes a hard error instead of silently disabling. This is the stated intent of the PR and is covered by the newempty_prebid_server_url_fails_orchestrator_startuptest. Acceptable.
Highlights
is_explicitly_disabledshort-circuit — Inspects raw JSON before deserialization soenabled: falseconfigs with otherwise invalid fields don't cause spurious startup failures. Smart design, well-tested.- Comprehensive test coverage — Invalid handler regexes (TOML + env var), disabled+invalid config short-circuits for integrations/providers/auction providers, enabled+invalid startup failures,
__NEXT_DATA__rewrite fixtures with ports/metacharacters/whitespace/hostname boundaries. - Consistent fallible builder pattern — Every integration's
build()uniformly returnsResult<Option<T>, Report<TrustedServerError>>, making the codebase very predictable. - Lockr regex moved to
Lazy<Regex>static (crates/common/src/integrations/lockr.rs:37-42) — Avoids recompiling on every request. strip_origin_host_with_optional_portboundary helper (crates/common/src/integrations/nextjs/shared.rs:67-100) — Properly preventsorigin.example.com.evilfrom matching while allowingorigin.example.com:8443/path. Edge cases handled correctly and tested.
All CI checks pass. High-quality hardening PR with thorough, consistent changes across all 25 files.
aram356
left a comment
There was a problem hiding this comment.
Summary
Solid defensive hardening that converts config-derived regex compilation and integration registration from panic/log-and-swallow to Result-based propagation with fail-fast at startup. The is_explicitly_disabled short-circuit is a well-targeted addition. Consistent application across all 10+ integrations with thorough test coverage.
Non-blocking
🤔 thinking
- Inconsistent parameter storage across rewriters:
RscUrlRewriterstoresorigin_host,UrlRewriteris stateless — sibling types with different conventions (shared.rs:118,script_rewriter.rs:113) restrict_accept_encodingunconditionally overwrites client preferences: prevents unsupported encodings from origin but does not honor clients that only acceptidentityor a subset (publisher.rs:19)
♻️ refactor
rewrite_nextjs_values_with_rewriteris a pass-through wrapper: delegates torewriter.rewrite_embedded()with no added logic — can be inlined (script_rewriter.rs:74)validate_pathcompiles regex redundantly:from_tomlnow callsprepare_runtime()which compiles viaOnceLock, making the validation-time compile redundant (settings.rs:473)- Redundant
prepare_runtime()call inget_settings:from_tomlalready calls it internally (settings_data.rs:37)
🌱 seedling
once_cell::sync::Lazy→std::sync::LazyLock: project uses Rust 2024 edition whereLazyLockis stable — opportunity to drop theonce_celldep in a follow-upstrip_origin_host_with_optional_portdoes not handle IPv6:RSC_URL_PATTERNcaptures bracketed IPv6 but this helper cannot match them — worth a doc comment (shared.rs:67)
👍 praise
is_explicitly_disabledshort-circuit: checking raw JSON before deserialization means disabled integrations with placeholder config do not blow up startup (settings.rs:123)- Excellent hardening test matrix: disabled-invalid-skips, enabled-invalid-fails, provider configs, registry/orchestrator startup, handler regex, env var overrides, and a real-world-shaped
__NEXT_DATA__fixture - Consistent
Result-based registration: uniform pattern across all integrations with no stragglers
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- CodeQL/Analyze: PASS
- format-docs: PASS
- format-typescript: PASS
There was a problem hiding this comment.
Review: Approve
Systematically converts all integration/provider registration from "log-and-skip" to "fail-fast" error propagation, eagerly validates config-derived regexes at startup, replaces once_cell with std::sync::LazyLock, and adds comprehensive regression tests. The design is sound: invalid enabled config fails startup, disabled invalid config is silently skipped via is_explicitly_disabled, and handler regexes are eagerly prepared.
No blocking issues.
Findings
P2 — RscUrlRewriter regex now matches all URL-like patterns, not just origin
shared.rs:23-28 — The old code compiled a per-instance regex with the origin host baked in (escape(origin_host) in the pattern), so the regex engine only ever fired the callback for origin URLs. The new static RSC_URL_PATTERN matches any URL-like pattern and the callback checks strip_origin_host_with_optional_port to decide whether to rewrite. Rewriting output is identical, but non-origin URLs now pay a callback invocation + .to_string() allocation before being returned unchanged. Mitigated by the !input.contains(origin_host) fast-path at line 137 and the fact that per-instance regex compilation was the larger cost. Net performance win for the common case — just noting the tradeoff.
P2 — Cell<bool> in script rewriter
script_rewriter.rs:166 — Tracking mutation inside replace_all via Cell<bool>. Could be simplified to checking matches!(result, Cow::Borrowed(_)) since replace_all returns Cow::Borrowed when the callback always returns the original text. Current approach is correct for single-threaded wasm32-wasip1 — minor style nit.
Highlights
- Excellent regression test coverage: disabled invalid config bypass, enabled invalid config failure, handler regex validation,
__NEXT_DATA__fixture rewrites, explicit port URL handling, publisher encoding fallback is_explicitly_disabledearly exit before deserialization/validation is a thoughtful design- Consistent
Ok(None)/Ok(Some(...))/Err(...)pattern across all integrations RscUrlRewritersimplification from per-instance compiled regex to zero-sized type with sharedLazyLock<Regex>eliminates per-request regex compilationrestrict_accept_encodingprevents the origin from responding with encodings the rewrite pipeline can't handle
restrict_accept_encoding was unconditionally setting Accept-Encoding: gzip, deflate, br on every proxied request, even when the client sent no preference. This caused compression-capable origins (Next.js) to respond with gzip, which the pipeline preserved in Content-Encoding. Test clients without a decompression feature then received raw gzip bytes instead of HTML, breaking the HtmlInjection integration test. Only restrict the header when the client already sent one.
After restrict_accept_encoding was fixed to early-return when no Accept-Encoding header is present, SUPPORTED_ENCODINGS became dead code since it was only used in the None fallback path. Inline the literal in the unit test and drop the constant.
aram356
left a comment
There was a problem hiding this comment.
Summary
This branch hardens config-derived regex compilation and integration startup error handling. The core change converts panic-prone expect() on config-derived regex into fallible Result returns, and converts every integration register/register_providers function from silently swallowing errors (log::error! + return None) to propagating them. It also migrates once_cell::sync::Lazy to std::sync::LazyLock, adds Accept-Encoding filtering for the publisher proxy, and refactors the Next.js UrlRewriter/RscUrlRewriter to separate config-time construction from request-time parameters.
Well-structured change overall with excellent test coverage of the enabled/disabled x valid/invalid config matrix.
Blocking
🔧 wrench
uncovered_admin_endpointssilently swallows regex errors:.unwrap_or(false)treats invalid regex as "no match" rather than surfacing the real error. Currently safe becauseprepare_runtime()catches it first, but fragile. (crates/common/src/settings.rs:537)- Performance regression in RSC URL rewriting: Global
RSC_URL_PATTERNnow matches every URL in the payload (not just origin URLs), then filters in the closure. For large RSC payloads with many non-origin URLs, this is slower than the previous per-request origin-specific regex. (crates/common/src/integrations/nextjs/shared.rs:23)
❓ question
lol_htmlversion downgrade: 2.7.2 -> 2.7.1 in workspaceCargo.toml— intentional or lockfile artifact? (Cargo.toml:72)
Non-blocking
🤔 thinking
RscUrlRewriteris now a zero-sized struct: All state was moved to function parameters. Free functions would be simpler unless config-time state is planned. (crates/common/src/integrations/nextjs/shared.rs:123)is_explicitly_disablededge case: Only fires for literal JSONfalse, not string"false". This is correct behavior but worth noting. (crates/common/src/settings.rs:139)
♻️ refactor
accept_encoding_qvaluereturns last match, not first: Most HTTP implementations use first-match semantics for duplicate tokens. (crates/common/src/publisher.rs:89)select_supported_accept_encoding(None)is dead code: The only caller returns early when header is absent. (crates/common/src/publisher.rs:37)
🌱 seedling
- Permissive host capture in
RSC_URL_PATTERN: The character class is broad. Safe today due tostrip_origin_host_with_optional_port, but watch for evolution. (crates/common/src/integrations/nextjs/shared.rs:25)
Make uncovered_admin_endpoints return Result so invalid handler regexes surface as errors rather than silently treating unmatched paths as uncovered. Also add prepare_runtime() to from_toml so invalid regexes are caught eagerly at parse time, closing the gap with from_toml_and_env. Fix the RSC URL rewriting performance regression by replacing the broad static RSC_URL_PATTERN with a per-request build_origin_url_pattern that embeds the escaped origin host directly, so only origin URLs are ever matched by replace_all. Fix accept_encoding_qvalue to use first-match semantics and remove the unreachable None branch from select_supported_accept_encoding. Restore lol_html to 2.7.2 — the 2.7.1 value was a merge artifact.
aram356
left a comment
There was a problem hiding this comment.
Summary
This PR systematically eliminates runtime panics from invalid config-derived regex patterns and silently-disabled integrations by compiling regexes eagerly at startup, making integration registration fallible, and short-circuiting validation for explicitly disabled configs. The Next.js rewriter refactor to separate config-time from request-time concerns is a solid architectural improvement. Two issues need addressing before merge.
Blocking
🔧 wrench
expect()in per-request code path:build_origin_url_pattern()uses.expect()insideRscUrlRewriter::rewrite(), which runs per-request. The origin host comes from config, making this the same class of bug the PR aims to fix. (crates/trusted-server-core/src/integrations/nextjs/shared.rs:112)once_cellversion relaxed without justification: Minimum version dropped from"1.21"to"1". Should be restored or the dependency removed if fully migrated toLazyLock. (Cargo.toml:78)
❓ question
- Redundant
prepare_runtime()insettings_data.rs:from_toml()already callsprepare_runtime(). The comment claims otherwise. Is this intentional defense-in-depth or a leftover? (crates/trusted-server-core/src/settings_data.rs:37)
Non-blocking
🤔 thinking
unwrap_or_else+if let Somein main.rs: ConflatesResultandOptioncontrol flow in one expression. A three-armmatchwould be clearer. (crates/trusted-server-adapter-fastly/src/main.rs:94)try_foldfor "any handler matches": Intent obscured by functional style. Aforloop with earlybreakwould be clearer and also short-circuit. (crates/trusted-server-core/src/settings.rs:541)
♻️ refactor
- Per-request regex compilation in
RscUrlRewriter::rewrite():build_origin_url_pattern()compiles a regex every request. Could be cached since origin host typically doesn't change. Not blocking due to thecontains()short-circuit. (crates/trusted-server-core/src/integrations/nextjs/shared.rs:159)
⛏ nitpick
- PR body checklist says "Uses
tracingmacros": This project uses thelogcrate, nottracing. Minor template issue.
CI Status
- fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-docs: PASS
- format-typescript: PASS
| /// that a broad static pattern would incur for every URL-like token in the payload. | ||
| fn build_origin_url_pattern(origin_host: &str) -> Regex { | ||
| let escaped = regex::escape(origin_host); | ||
| Regex::new(&format!( |
There was a problem hiding this comment.
🔧 wrench — expect() in per-request code path
build_origin_url_pattern() is called per-request inside RscUrlRewriter::rewrite() and uses .expect("should compile origin-specific RSC URL regex"). While the origin host is escaped via regex::escape(), the entire purpose of this PR is to eliminate panic-prone patterns from config-derived data. The origin host comes from config (origin_host in integration settings), making this the same class of bug the PR aims to fix.
Suggested fix: Return Result and propagate the error, or at minimum document why this particular expect() is safe (the regex::escape() guarantee) with a comment and a test using adversarial origin host inputs.
| lol_html = "2.7.2" | ||
| once_cell = "1.21" | ||
| matchit = "0.9" | ||
| once_cell = "1" |
There was a problem hiding this comment.
🔧 wrench — once_cell version relaxed from "1.21" to "1" without justification
The minimum version was dropped from 1.21 to 1, which allows any 1.x version. This is unrelated to the PR's purpose.
Suggested fix: Restore once_cell = "1.21" or remove the dependency entirely if all usages have been migrated to std::sync::LazyLock.
| message: "Failed to validate configuration".to_string(), | ||
| })?; | ||
|
|
||
| // `from_toml` only deserializes the embedded runtime TOML. Startup still |
There was a problem hiding this comment.
❓ question — Is this prepare_runtime() call intentionally redundant?
from_toml() already calls prepare_runtime() at settings.rs:408. This call is redundant (OnceLock makes it harmless but misleading). The comment says "from_toml only deserializes the embedded runtime TOML" — but that's not accurate since from_toml does call prepare_runtime(). Is this intentional belt-and-suspenders, or was it added during iteration and should be removed?
| // Keep this fallback so manually-constructed or otherwise unprepared | ||
| // settings still become an error response instead of panicking. | ||
| if let Some(mut response) = enforce_basic_auth(settings, &req).unwrap_or_else(|e| { | ||
| log::error!("Failed to evaluate basic auth: {:?}", e); |
There was a problem hiding this comment.
🤔 thinking — unwrap_or_else + if let Some conflates two levels of control flow
if let Some(mut response) = enforce_basic_auth(settings, &req).unwrap_or_else(|e| {
log::error!("Failed to evaluate basic auth: {:?}", e);
Some(to_error_response(&e))
}) {A match with three arms (Ok(Some(...)), Ok(None), Err(...)) would be clearer and more idiomatic for a request entry point:
match enforce_basic_auth(settings, &req) {
Ok(Some(mut response)) => {
finalize_response(settings, geo_info.as_ref(), &mut response);
return Ok(response);
}
Ok(None) => { /* continue */ }
Err(e) => {
log::error!("Failed to evaluate basic auth: {:?}", e);
let mut response = to_error_response(&e);
finalize_response(settings, geo_info.as_ref(), &mut response);
return Ok(response);
}
}| for &path in Self::ADMIN_ENDPOINTS { | ||
| let covered = self | ||
| .handlers | ||
| .iter() |
There was a problem hiding this comment.
🤔 thinking — try_fold obscures the intent "does any handler match?"
let covered = self.handlers.iter()
.try_fold(false, |acc, h| h.matches_path(path).map(|m| acc || m))?;A for loop with early return would be clearer and also short-circuit (avoiding regex evaluation on remaining handlers after a match):
let mut covered = false;
for h in &self.handlers {
if h.matches_path(path)? {
covered = true;
break;
}
}| let Some(host_suffix) = strip_origin_host_with_optional_port(host, origin_host) else { | ||
| return caps | ||
| .get(0) | ||
| .expect("should capture the matched RSC URL") |
There was a problem hiding this comment.
♻️ refactor — Per-request regex compilation could be cached
build_origin_url_pattern(origin_host) compiles a regex on every request. The comment argues compilation is cheap relative to replace_all, which is reasonable for now. But if the origin host is typically the same across requests (it comes from config), caching the compiled regex in the rewriter would be a natural optimization. Not blocking since the !input.contains(origin_host) short-circuit avoids compilation when there's nothing to rewrite.
Summary
TrustedServerErrorresponses.__NEXT_DATA__rewrites, and publisher fallback encoding behavior.Changes
.claude/agents/pr-creator.md.claude/agents/pr-reviewer.mdcrates/common/src/auction/mod.rscrates/common/src/auth.rscrates/common/src/integrations/adserver_mock.rscrates/common/src/integrations/aps.rscrates/common/src/integrations/datadome.rscrates/common/src/integrations/didomi.rscrates/common/src/integrations/google_tag_manager.rscrates/common/src/integrations/gpt.rscrates/common/src/integrations/lockr.rscrates/common/src/integrations/mod.rscrates/common/src/integrations/nextjs/html_post_process.rscrates/common/src/integrations/nextjs/mod.rs__NEXT_DATA__coverage.crates/common/src/integrations/nextjs/rsc.rscrates/common/src/integrations/nextjs/script_rewriter.rsexpect()regex compilation with fallible rewriters and add hostname, port, whitespace, and metacharacter regressions.crates/common/src/integrations/nextjs/shared.rscrates/common/src/integrations/permutive.rscrates/common/src/integrations/prebid.rsserver_url, and return fallible provider builders.crates/common/src/integrations/registry.rscrates/common/src/integrations/testlight.rscrates/common/src/publisher.rsAccept-Encodingto codecs the rewrite pipeline supports and add regression coverage.crates/common/src/settings.rsenabled = falseconfigs, and add startup hardening regressions.crates/common/src/settings_data.rscrates/fastly/src/main.rsHardening note
Invalid
handlers[].pathregexes now fail during startup preparation and still return descriptive configuration errors if request-time code encounters unprepared settings. Enabled integrations and auction providers now fail registry/orchestrator startup instead of logging and disabling themselves, while rawenabled = falseconfigs still short-circuit before validation. Regression coverage includes invalid handler TOML and env overrides, disabled-invalid config bypasses, enabled-invalid integration/provider startup failures, emptyprebid.server_url, Next.js__NEXT_DATA__and RSC hostname/port/metacharacter rewrites, and the publisher fallback encoding path.Closes
Closes #403
Test plan
cargo test --workspacecargo clippy --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --bin trusted-server-fastly --release --target wasm32-wasip1fastly compute servenpx vitest runwas skipped per explicit user instruction because of the unrelated repo-wideERR_REQUIRE_ESMfailure inhtml-encoding-sniffer.Checklist
unwrap()in production code — useexpect(\"should ...\")tracingmacros (notprintln!)