Skip to content

fix: set connection401Error before invoking resultCallback in onError#332

Draft
kinyoklion wants to merge 7 commits intomainfrom
devin/1773960729-fix-flaky-streaming-test
Draft

fix: set connection401Error before invoking resultCallback in onError#332
kinyoklion wants to merge 7 commits intomainfrom
devin/1773960729-fix-flaky-streaming-test

Conversation

@kinyoklion
Copy link
Member

@kinyoklion kinyoklion commented Mar 19, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Related issues

Supersedes #329, which worked around the same race condition in the startWithHttp401ShutsDownSink test with a polling loop.

Describe the solution you've provided

In StreamingDataSource.onError(), resultCallback.onError() was called before connection401Error = true. This meant that any thread unblocked by the callback could observe stale state — e.g. start() would not see connection401Error and would proceed with a second connection attempt.

The fix sets connection401Error = true before resultCallback.onError() is invoked, while keeping dataSourceUpdateSink.shutDown() after the callback to preserve the original status listener notification ordering. This splits the original if (code == 401) block into two: one before the callback (to set the flag) and one after (to trigger shutdown).

The startWithHttp401ShutsDownSink test retains its polling workaround for shutDownCalled since shutDown() still executes after the callback. The startWithHttp401PreventsSubsequentStart test no longer needs any workaround since connection401Error is set before the callback unblocks.

Describe alternatives you've considered

  1. Move both connection401Error and shutDown() before the callback (earlier revision of this PR): This eliminated the race entirely but changed the status listener notification ordering — listeners would see SHUTDOWN before the error, rather than the original error-then-SHUTDOWN sequence.
  2. Polling workarounds in tests only (test: poll for shutDownCalled in startWithHttp401ShutsDownSink test (SDK-2028) #329 and earliest revision of this PR): Added polling loops to wait for state changes after awaitError(). This worked but masked the underlying ordering issue in production code.

Additional context

  • connection401Error is not declared volatile (unlike running). The current fix is safe because onError runs on a single background thread and start() is called from the test thread only after awaitError() returns — which provides a happens-before relationship via the BlockingQueue inside TrackingCallback. However, a reviewer may want to confirm whether volatile should be added for defensive safety.
  • The if (code == 401) condition is now checked twice (before and after the callback). This is intentional to keep the flag-setting and shutdown as separate concerns with the callback in between.

Human review checklist

  • Verify the split if (code == 401) blocks are correct — flag set before callback, shutdown after
  • Verify no production callers depend on the previous ordering where resultCallback.onError() fired before connection401Error was set
  • Consider whether connection401Error should be declared volatile for consistency with running

Link to Devin session: https://app.devin.ai/sessions/e45834672c2345108fb56b5fd044a400
Requested by: @kinyoklion


Open with Devin

…art test

The startWithHttp401PreventsSubsequentStart test is flaky due to a race
condition. In StreamingDataSource.onError(), the production code calls
resultCallback.onError() (which unblocks the test's callback.awaitError())
before setting connection401Error and calling dataSourceUpdateSink.shutDown().
The test thread can resume and call the second sds.start() before
connection401Error is set, causing the second start to proceed and produce
a callback.

This adds a short polling loop (up to 1 second, checking every 10ms) to
wait for shutDownCalled to become true before testing the second start,
matching the pattern used in the startWithHttp401ShutsDownSink fix.

Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@devin-ai-integration devin-ai-integration bot added the devin-pr PRs created by Devin label Mar 19, 2026
devin-ai-integration bot and others added 2 commits March 19, 2026 23:03
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

while (!dataSourceUpdateSink.shutDownCalled && System.currentTimeMillis() < deadline) {
Thread.sleep(10);
}
assertTrue(dataSourceUpdateSink.shutDownCalled);
Copy link
Member Author

@kinyoklion kinyoklion Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dataSourceUpdateSink.shutDownCalled thread-safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — it's declared as volatile in MockComponents.java line 141:

public volatile boolean shutDownCalled = false;

The volatile guarantees visibility across threads, so the polling loop in the test thread will see the write from the background thread without any additional synchronization. This is the same field and pattern used by the sibling test startWithHttp401ShutsDownSink (from #329).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug we should fix instead of a test change? Should connection401Error be set before the callback is invoked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — I've updated the PR to fix this in the production code instead. Moved connection401Error = true and dataSourceUpdateSink.shutDown() before resultCallback.onError() in StreamingDataSource.onError(), so the 401 state is fully established before the callback unblocks any waiting threads.

This also let me revert the polling workarounds from both this test and the one from #329 (startWithHttp401ShutsDownSink), since they're no longer needed.

devin-ai-integration bot and others added 2 commits March 24, 2026 22:41
Move connection401Error and shutDown() before resultCallback.onError()
so that the 401 state is fully established before the callback unblocks
any waiting threads. This eliminates the race condition at the source
rather than working around it in tests.

Also reverts the polling workarounds from both startWithHttp401ShutsDownSink
(#329) and startWithHttp401PreventsSubsequentStart since they are no longer
needed.

Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
…ming-test' into devin/1773960729-fix-flaky-streaming-test
@devin-ai-integration devin-ai-integration bot changed the title test: poll for shutDownCalled in startWithHttp401PreventsSubsequentStart test fix: set connection401Error before invoking resultCallback in onError Mar 24, 2026
@kinyoklion
Copy link
Member Author

Check for any other side effects from this change. For example any problems or changes in event ordering from calling shutdown on the event source before emitting the error.

@devin-ai-integration
Copy link
Contributor

I traced through the production call chain to check for side effects. Here's the analysis:

Original order (before this PR):

  1. resultCallback.onError()ConnectivityManager callback → updateConnectionInfoForError(currentMode, error) → notifies status listeners with error + current mode (e.g. STREAMING)
  2. dataSourceUpdateSink.shutDown()ConnectivityManager.shutDown() → sets closed=true, stops data source, removes platform listeners → setStatus(SHUTDOWN, null)updateConnectionInfoForSuccess(SHUTDOWN) → notifies status listeners with SHUTDOWN mode

New order (this PR):

  1. dataSourceUpdateSink.shutDown() → same as above: sets closed=true, stops data source, removes platform listeners → notifies status listeners with SHUTDOWN mode
  2. resultCallback.onError()updateConnectionInfoForError(connectionInformation.getConnectionMode(), error) → reads mode as SHUTDOWN (already set by step 1) → notifies status listeners with error + SHUTDOWN mode

Differences:

  • Status listener event ordering changes: Originally listeners see error-with-STREAMING first, then SHUTDOWN. Now they see SHUTDOWN first, then error-with-SHUTDOWN. The final connectionInformation state is equivalent either way (mode=SHUTDOWN, lastFailure set).
  • ConnectivityManager.shutDown() runs earlier: This calls oldDataSource.stop() (spawns a thread to close the EventSource) and removes foreground/connectivity listeners. Since we're still on the SSE handler thread, this is safe — the callback doesn't depend on those listeners or the EventSource being open. The subsequent stop(null) at line 141 also spawns a close thread, but stopSync() is synchronized and nulls es, so the second call is a no-op.
  • closed is set before callback fires: After shutDown(), ConnectivityManager.closed=true, so updateDataSource() and startUp() would return early. But the callback only calls updateConnectionInfoForError() and onCompletion.onSuccess(), neither of which check closed.

Assessment: The behavioral change is the status listener notification ordering. If any consumer relies on seeing the error notification before the SHUTDOWN notification, this could be a breaking change. However, the new order arguably makes more sense — the system transitions to SHUTDOWN state first, then reports why.

Let me know if you'd like me to preserve the original listener notification ordering, or if you see any other concerns I missed.

@@ -133,11 +133,11 @@ public void onError(Throwable t) {
if (code >= 400 && code < 500) {
logger.error("Encountered non-retriable error: {}. Aborting connection to stream. Verify correct Mobile Key and Stream URI", code);
running = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the original callback order, but we should set connection401Error before the callbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated — connection401Error is now set before the callback, while dataSourceUpdateSink.shutDown() remains after the callback to preserve the original status listener notification ordering.

The startWithHttp401ShutsDownSink test retains the polling workaround for shutDownCalled since shutDown() still executes after the callback returns. The startWithHttp401PreventsSubsequentStart test no longer needs any workaround since connection401Error is set before the callback unblocks.

…rdering

Set connection401Error = true before resultCallback.onError() so the
flag is visible when the callback unblocks waiting threads. Keep
dataSourceUpdateSink.shutDown() after the callback to preserve the
original status listener notification ordering.

The startWithHttp401ShutsDownSink test retains its polling workaround
since shutDown() still executes after the callback returns.

Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devin-pr PRs created by Devin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant