Skip to content

Add exception metric emission filter#5590

Open
feordin wants to merge 7 commits into
mainfrom
feordin/skip-auth-failure-metrics
Open

Add exception metric emission filter#5590
feordin wants to merge 7 commits into
mainfrom
feordin/skip-auth-failure-metrics

Conversation

@feordin
Copy link
Copy Markdown
Contributor

@feordin feordin commented May 27, 2026

Summary

Adds a pluggable IExceptionMetricEmissionFilter interface and a default AuthenticationFailureExceptionMetricEmissionFilter that suppresses the fhir/failures/exceptions metric for authentication exceptions (SecurityTokenException or one in the inner-exception chain) that produced a 401 response.

This addresses the auth-failure flood incident that caused the shared Geneva metric account to throttle.

Extensibility for fhir-paas

The filter chain is designed so that fhir-paas can extend or override without forking fhir-server:

  • Add a new rule — register an additional IExceptionMetricEmissionFilter. The enricher AND-combines all registered filters; the default registration uses AddSingleton (not TryAddSingleton) so additions compose.
  • Extend what counts as an authentication exception — subclass AuthenticationFailureExceptionMetricEmissionFilter and override the protected virtual IsAuthenticationException method, then replace the registration with RemoveAll + AddSingleton.
  • Disable entirelyservices.RemoveAll<IExceptionMetricEmissionFilter>().

Documented in detail in ADR-2605.

Testing

dotnet test src/Microsoft.Health.Fhir.Api.UnitTests/Microsoft.Health.Fhir.R4.Api.UnitTests.csproj — 1315 passed, 0 failed (net8.0 and net9.0).

feordin and others added 4 commits May 26, 2026 10:02
Geneva-side filtering of auth-failure metrics is not supported, so we

must stop the flood at the source. ApiNotificationMiddleware now skips

publishing ApiResponseNotification for 401/403 responses, preventing

auth-failure bursts (e.g., expired-token retry storms) from overwhelming

the shared metric account.

ADR adr-2605 is rewritten to reflect this decision and document why the

earlier rate-limiting designs were superseded.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@feordin feordin requested a review from a team as a code owner May 27, 2026 17:24
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.36%. Comparing base (fdf6900) to head (077e3a8).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5590      +/-   ##
==========================================
+ Coverage   77.28%   77.36%   +0.08%     
==========================================
  Files         995      998       +3     
  Lines       36467    36553      +86     
  Branches     5520     5545      +25     
==========================================
+ Hits        28184    28280      +96     
+ Misses       6924     6902      -22     
- Partials     1359     1371      +12     

see 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Reverts the broad 401/403 suppression in ApiNotificationMiddleware and

replaces it with a targeted, pluggable IExceptionMetricEmissionFilter

applied at the AzureMonitorOpenTelemetryLogEnricher chokepoint where

the fhir/failures/exceptions metric is actually emitted.

The default filter (AuthenticationFailureExceptionMetricEmissionFilter)

suppresses emission only when both an auth exception (SecurityToken

Exception or one in the inner chain) AND a 401 response are present —

the exact signature of the incident. 403s, validation errors, and other

failures continue to emit metrics.

Downstream consumers (fhir-paas) can compose additional filters via DI,

subclass the default filter (IsAuthenticationException is virtual), or

replace it entirely with RemoveAll. ADR-2605 rewritten to capture the

new design and the extension patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@feordin feordin changed the title Suppress 401/403 from FHIR error metric at emission point Add pluggable exception metric emission filter to suppress auth-failure floods May 27, 2026
Comment on lines +89 to +95
foreach (var filter in _exceptionMetricEmissionFilters)
{
if (!filter.ShouldEmit(data.Exception, httpContext))
{
return;
}
}
… docs

Auth filter: The existing SecurityTokenException inner-chain match

already covers SecurityTokenExpiredException, SecurityTokenInvalid

AudienceException, and SecurityTokenInvalidIssuerException (all derive

from SecurityTokenException). XML doc updated to call this out.

New SecurityAbuseExceptionMetricEmissionFilter suppresses metric

emission for ServerSideRequestForgeryException (defined in this repo).

Match is exception-type-only — no status-code condition — because these

exceptions are constructed only inside FHIR's abuse-rejection paths and

always produce 4xx responses.

Downstream consumers add paas-only exception types (AntiSSRFException,

S2SAuthenticationException) by subclassing the relevant default filter

and overriding the protected virtual Is...Exception method, then

re-registering via RemoveAll + AddSingleton. Documented in the ADR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@feordin feordin changed the title Add pluggable exception metric emission filter to suppress auth-failure floods Add exception metric emission filter May 27, 2026
@feordin feordin added this to the FY26\Q4\2Wk\2Wk24 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants