Skip to content

Fix transaction bundle returning HTTP 408 instead of 4xx for validation errors#5582

Open
rbans96 wants to merge 1 commit into
mainfrom
personal/ribans/fix-transaction-408-status-code
Open

Fix transaction bundle returning HTTP 408 instead of 4xx for validation errors#5582
rbans96 wants to merge 1 commit into
mainfrom
personal/ribans/fix-transaction-408-status-code

Conversation

@rbans96
Copy link
Copy Markdown
Contributor

@rbans96 rbans96 commented May 22, 2026

Transaction bundle returns HTTP 408 instead of 400 when entry fails validation during parallel execution. In TransactionExceptionHandler.ThrowTransactionException, when isCancelled is true, the code unconditionally throws FhirTransactionCancelledException (hardcoded to 408), ignoring the actual entry error status. During parallel bundle execution, when one entry fails validation, the catch handler cancels the linked CancellationTokenSource. Other entries with validation errors that complete after the token is cancelled see IsBundleCancelledByClient() == true and get misclassified as client cancellations (408) instead of validation failures (400). Since await Task.WhenAll propagates the first faulted task by array position (not chronological order), the 408 is returned to the caller. Callers correctly treat 408 as transient/retryable per HTTP semantics, causing infinite retry loops and massive request amplification (6-12x) for permanently failing bundles.

Fix: When an entry has a concrete client error (4xx, excluding 408), that status code takes priority over the cancellation signal. HTTP 408 is now reserved for genuine timeout/cancellation scenarios only. This aligns with the FHIR specification: 'return an HTTP 400 or 500 type response' for failed transactions.

Addresses : https://microsofthealth.visualstudio.com/Health/_workitems/edit/192974

@rbans96 rbans96 requested a review from a team as a code owner May 22, 2026 17:36
@rbans96 rbans96 added Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs No-ADR ADR not needed Bug Bug bug bug. labels May 22, 2026
@rbans96 rbans96 added this to the CY25Q3/2Wk07 milestone May 22, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.33%. Comparing base (ef9bf03) to head (c9eb4cc).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5582      +/-   ##
==========================================
- Coverage   77.64%   77.33%   -0.31%     
==========================================
  Files         993      996       +3     
  Lines       36438    36535      +97     
  Branches     5517     5539      +22     
==========================================
- Hits        28291    28255      -36     
- Misses       6789     6911     +122     
- Partials     1358     1369      +11     

see 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

// by a previous entry's failure in parallel execution, not by the client disconnecting.
// Returning 408 for a validation error (e.g., 400) misleads callers into retrying
// a permanently failing request.
bool isClientError = statusCode >= HttpStatusCode.BadRequest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the suggestion is to use a flag to detect when a FhirTransactionFailedException happened due a customer caused issue. And then when detecting if the error was caused by a client cancellation, then mark it as a customer issue.

That should remove the need to compare the precedence between BadRequest, InternalServerError and RequestTimeout.

Image

// Even though the cancellation token is triggered, the 4xx status from the entry
// takes priority over 408 to avoid misleading callers into retrying permanently failing requests.
FhirTransactionFailedException fhirTfe = await Assert.ThrowsAsync<FhirTransactionFailedException>(async () => await _bundleHandler.Handle(bundleRequest, cancellationToken));
Assert.True(fhirTfe.ResponseStatusCode == System.Net.HttpStatusCode.NotFound);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not expecting this operation to return HTTP404. If that's the case, then we need to change this test to proper submit an Observation that would be ingested if the cancellation did not happen.

@rbans96 rbans96 force-pushed the personal/ribans/fix-transaction-408-status-code branch from bc3b76c to c9eb4cc Compare May 26, 2026 22:21
operationOutcome,
isCancelled: true));

Assert.Equal(HttpStatusCode.RequestTimeout, tce.ResponseStatusCode);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the impression that the results should be different than Timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug Bug bug bug. No-ADR ADR not needed No-PaaS-breaking-change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants