fix: set connection401Error before invoking resultCallback in onError#332
fix: set connection401Error before invoking resultCallback in onError#332kinyoklion wants to merge 7 commits intomainfrom
Conversation
…art test The startWithHttp401PreventsSubsequentStart test is flaky due to a race condition. In StreamingDataSource.onError(), the production code calls resultCallback.onError() (which unblocks the test's callback.awaitError()) before setting connection401Error and calling dataSourceUpdateSink.shutDown(). The test thread can resume and call the second sds.start() before connection401Error is set, causing the second start to proceed and produce a callback. This adds a short polling loop (up to 1 second, checking every 10ms) to wait for shutDownCalled to become true before testing the second start, matching the pattern used in the startWithHttp401ShutsDownSink fix. Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
| while (!dataSourceUpdateSink.shutDownCalled && System.currentTimeMillis() < deadline) { | ||
| Thread.sleep(10); | ||
| } | ||
| assertTrue(dataSourceUpdateSink.shutDownCalled); |
There was a problem hiding this comment.
Is dataSourceUpdateSink.shutDownCalled thread-safe?
There was a problem hiding this comment.
Yes — it's declared as volatile in MockComponents.java line 141:
public volatile boolean shutDownCalled = false;The volatile guarantees visibility across threads, so the polling loop in the test thread will see the write from the background thread without any additional synchronization. This is the same field and pattern used by the sibling test startWithHttp401ShutsDownSink (from #329).
There was a problem hiding this comment.
Is this a bug we should fix instead of a test change? Should connection401Error be set before the callback is invoked?
There was a problem hiding this comment.
Good call — I've updated the PR to fix this in the production code instead. Moved connection401Error = true and dataSourceUpdateSink.shutDown() before resultCallback.onError() in StreamingDataSource.onError(), so the 401 state is fully established before the callback unblocks any waiting threads.
This also let me revert the polling workarounds from both this test and the one from #329 (startWithHttp401ShutsDownSink), since they're no longer needed.
Move connection401Error and shutDown() before resultCallback.onError() so that the 401 state is fully established before the callback unblocks any waiting threads. This eliminates the race condition at the source rather than working around it in tests. Also reverts the polling workarounds from both startWithHttp401ShutsDownSink (#329) and startWithHttp401PreventsSubsequentStart since they are no longer needed. Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
…ming-test' into devin/1773960729-fix-flaky-streaming-test
|
Check for any other side effects from this change. For example any problems or changes in event ordering from calling shutdown on the event source before emitting the error. |
|
I traced through the production call chain to check for side effects. Here's the analysis: Original order (before this PR):
New order (this PR):
Differences:
Assessment: The behavioral change is the status listener notification ordering. If any consumer relies on seeing the error notification before the SHUTDOWN notification, this could be a breaking change. However, the new order arguably makes more sense — the system transitions to SHUTDOWN state first, then reports why. Let me know if you'd like me to preserve the original listener notification ordering, or if you see any other concerns I missed. |
| @@ -133,11 +133,11 @@ public void onError(Throwable t) { | |||
| if (code >= 400 && code < 500) { | |||
| logger.error("Encountered non-retriable error: {}. Aborting connection to stream. Verify correct Mobile Key and Stream URI", code); | |||
| running = false; | |||
There was a problem hiding this comment.
I think we should keep the original callback order, but we should set connection401Error before the callbacks.
There was a problem hiding this comment.
Updated — connection401Error is now set before the callback, while dataSourceUpdateSink.shutDown() remains after the callback to preserve the original status listener notification ordering.
The startWithHttp401ShutsDownSink test retains the polling workaround for shutDownCalled since shutDown() still executes after the callback returns. The startWithHttp401PreventsSubsequentStart test no longer needs any workaround since connection401Error is set before the callback unblocks.
…rdering Set connection401Error = true before resultCallback.onError() so the flag is visible when the callback unblocks waiting threads. Keep dataSourceUpdateSink.shutDown() after the callback to preserve the original status listener notification ordering. The startWithHttp401ShutsDownSink test retains its polling workaround since shutDown() still executes after the callback returns. Co-Authored-By: rlamb@launchdarkly.com <kingdewman@gmail.com>
Requirements
Related issues
Supersedes #329, which worked around the same race condition in the
startWithHttp401ShutsDownSinktest with a polling loop.Describe the solution you've provided
In
StreamingDataSource.onError(),resultCallback.onError()was called beforeconnection401Error = true. This meant that any thread unblocked by the callback could observe stale state — e.g.start()would not seeconnection401Errorand would proceed with a second connection attempt.The fix sets
connection401Error = truebeforeresultCallback.onError()is invoked, while keepingdataSourceUpdateSink.shutDown()after the callback to preserve the original status listener notification ordering. This splits the originalif (code == 401)block into two: one before the callback (to set the flag) and one after (to trigger shutdown).The
startWithHttp401ShutsDownSinktest retains its polling workaround forshutDownCalledsinceshutDown()still executes after the callback. ThestartWithHttp401PreventsSubsequentStarttest no longer needs any workaround sinceconnection401Erroris set before the callback unblocks.Describe alternatives you've considered
connection401ErrorandshutDown()before the callback (earlier revision of this PR): This eliminated the race entirely but changed the status listener notification ordering — listeners would see SHUTDOWN before the error, rather than the original error-then-SHUTDOWN sequence.awaitError(). This worked but masked the underlying ordering issue in production code.Additional context
connection401Erroris not declaredvolatile(unlikerunning). The current fix is safe becauseonErrorruns on a single background thread andstart()is called from the test thread only afterawaitError()returns — which provides a happens-before relationship via theBlockingQueueinsideTrackingCallback. However, a reviewer may want to confirm whethervolatileshould be added for defensive safety.if (code == 401)condition is now checked twice (before and after the callback). This is intentional to keep the flag-setting and shutdown as separate concerns with the callback in between.Human review checklist
if (code == 401)blocks are correct — flag set before callback, shutdown afterresultCallback.onError()fired beforeconnection401Errorwas setconnection401Errorshould be declaredvolatilefor consistency withrunningLink to Devin session: https://app.devin.ai/sessions/e45834672c2345108fb56b5fd044a400
Requested by: @kinyoklion