Implement find() wait and RECONNECT structure event#24
Implement find() wait and RECONNECT structure event#24stefanrammo wants to merge 1 commit intomainfrom
Conversation
0d5d0ef to
53d5a07
Compare
|
|
||
| - ``studio.api.structure.ADD`` (1) — An application appeared for the first time | ||
| - ``studio.api.structure.REMOVE`` (0) — An application went offline | ||
| - ``studio.api.structure.RECONNECT`` (2) — An application restarted (was seen before, went offline, came back) |
There was a problem hiding this comment.
This doc makes it look like subscribeToStructure is only meant to detect application starts and stops but this is only one of the use cases. It can also detect when nodes (e.g. CDPOperators) are added and deleted during runtime
README.rst
Outdated
| path - Path of the object to look for. | ||
| path - Dot-separated path to target node (e.g. ``'App2.CPULoad'``). | ||
|
|
||
| options - Optional object. ``{ timeout: milliseconds }`` to limit wait time. |
There was a problem hiding this comment.
I'd mention right here that default is infinite wait
README.rst
Outdated
| - Behavior | ||
|
|
||
| The requested node must reside in the application client was connected to. | ||
| Waits indefinitely for the target application to appear. If the application is already |
There was a problem hiding this comment.
Should add that it only applies when timeout arg is missing
| if (nodeTimestamp !== undefined) { | ||
| var ts = Number(nodeTimestamp); | ||
| if (ts > 0) { | ||
| if (lastServerTimestamp !== null && ts < lastServerTimestamp) |
There was a problem hiding this comment.
Also, to prevent duplicate notification, maybe could just check if the received value equals node's current value? Unlike events, there is no server-side history for value changes.
There was a problem hiding this comment.
Deduplication is per node. Added component tests to verify: two sines in different scheduling group threads and a concurrent setValue on a routed signal.
Tested value equality and had many test failures since the server sends values at the subscribed sampling rate regardless of value changes.
There was a problem hiding this comment.
"Tested value equality and had many test failures since the server sends values at the subscribed sampling rate regardless of value changes." - if this is true, it sounds like a server-side bug. StudioAPI protocol should be optimized to only send values when they actually change. Maybe you should report an issue to Jira.
There was a problem hiding this comment.
My mistake, not a server bug. setValue sets lastValue locally before the server confirms, and cached nodes retain it across re-subscribe. Value equality would suppress the initial delivery when it matches. Timestamp-based handling avoids that. The test failures were from this, not from the server sending unchanged values periodically.
| // Trim when map grows large (30s sliding window in nanoseconds) | ||
| if (recentEventIds.size > 200 && ts > 0) { | ||
| var cutoff = ts - 30e9; | ||
| recentEventIds.forEach(function(storedTs, evId) { |
There was a problem hiding this comment.
This looks like a potential performance issue if for every event, it will iterate over all events in the map. Should try the code against a system with 100 000 events.
There was a problem hiding this comment.
Note, the C++ client had several iterations to get this right and eventually settled on a std::multimap which is a tree-based structure that keeps its items sorted by timestamp and that allows it to do binary search (with equal_range and lower_bound) which is more efficient than iterating with forEach. See https://bitbucket.seaonics.com/projects/CS/repos/studioapiclientlib/browse/src/studioapi/cache/eventduplicatechecker.cpp#27,43
| var hasNotifiedDisconnect = false; // Guard for onDisconnected lifecycle callback | ||
| var hasConnectedBefore = false; // Distinguishes initial connect from reconnect | ||
| var SERVICES_TIMEOUT_MS = (INACTIVITY_RESEND_INTERVAL_S + 30) * 1000; | ||
| var reconnectDelayMs = INITIAL_RECONNECT_DELAY_MS; |
There was a problem hiding this comment.
Why are some variables UPPER_CASE and some camelCase? Should check which style is more common in JS and use that.
|
|
||
| // Stall detection: force-close socket if no server messages for STALL_TIMEOUT_MS | ||
| // while there are active subscriptions expecting data. Without subscriptions, | ||
| // silence is expected and not a stall. |
There was a problem hiding this comment.
In C++ client, there is always timesync protocol running, so there is always someone expecting data and keeping the connection alive.
index.js
Outdated
|
|
||
| // Stall detection runs only on the primary connection. In direct mode, sibling | ||
| // stalls are detected via REMOVE events when the primary's structure updates. | ||
| // In proxy mode, sibling traffic flows through the primary, so one timer covers all. |
There was a problem hiding this comment.
Should still detect stalls to sibling connections, I think. The primary does not verify the messages it proxies actually reach the siblings. So, in case of networking issues, some connections might still stall.
In both direct and proxy mode, the primary will notify of sibling app start/stop via structure invalidation on root node (structure_change_response). And in both modes, connections can stall if there are networking issues. The only difference is that in proxy mode there will be a services_notification in addition to structure_change_response
index.js
Outdated
| studio.api = (function(internal) { | ||
| var obj = {}; | ||
| // No default timeout — find() waits indefinitely for the app to appear. | ||
| // Use { timeout: 30000 } for explicit timeout, { timeout: 0 } for immediate fail. |
There was a problem hiding this comment.
Why is that comment in this place here?
index.js
Outdated
| // In direct mode, reject if the app was previously connected but is now | ||
| // disconnected. This prevents returning stale nodes whose connection is down. | ||
| // In proxy mode, the node tree is maintained by the primary connection so | ||
| // stale access is not an issue — the primary connection handles structure. |
There was a problem hiding this comment.
I don't think there is any difference between direct and proxy connections here. Proxy is just a websocket proxy, it doesn't maintain anything in StudioAPI protocol level.
find() now waits for the target app to appear instead of failing
immediately. subscribeToStructure fires RECONNECT (2) when an app
reappears after being removed, distinguishing restarts from first-time
appearances.
Changes:
- find() waits indefinitely by default; { timeout: N } limits the wait;
{ timeout: 0 } preserves the old immediate-fail behavior
- find() works without prior root() call (triggers connection internally)
- close() is terminal: sets isClosed flag, rejects pending find() waiters,
prevents reconnection after close
- RECONNECT constant (2) added alongside ADD (1) and REMOVE (0)
- Per-app lifecycle via announceApp/unannounceApp: each app is tracked
independently, fixing the bug where one sibling disconnecting lost
all sibling connections
- Subscription keepalive via inactivityResendInterval field (client
requests server resend values/events after 120s of no changes,
matching C++ client ServicesProtocol::InactivityResendIntervalSec)
- Value dedup via lastServerTimestamp and event dedup via recentEventIds
to handle server replay after reconnection
- Exponential backoff with jitter for reconnection (1s to 30s max)
- Stall detection: force reconnect after 150s of server silence with
active subscriptions (must exceed 120s keepalive interval)
- Direct mode sibling discovery uses server-pushed
eStructureChangeResponse instead of client-side polling
- CDP_FORCE_DIRECT_MODE=1 env var to force direct mode on proxy-capable
servers (for testing)
- Cache key fix: use join('.') instead of toString() for correct
invalidateApp prefix matching on app structure changes
- Remove unused serviceId storage on proxy connections
- Fix instanceId serialization: pass through as-is instead of defaulting
falsy values to 0
- appAddress() helper for DRY server address construction
- README updated with Structure Events section and find() options
- Version bump to 3.0.0 (breaking: find() default changed from
immediate fail to indefinite wait)
CDP-6069
53d5a07 to
fd47e9f
Compare
| // Trim when map grows large (30s sliding window in nanoseconds) | ||
| if (recentEventIds.size > 200 && ts > 0) { | ||
| var cutoff = ts - 30e9; | ||
| recentEventIds.forEach(function(storedTs, evId) { |
There was a problem hiding this comment.
Note, the C++ client had several iterations to get this right and eventually settled on a std::multimap which is a tree-based structure that keeps its items sorted by timestamp and that allows it to do binary search (with equal_range and lower_bound) which is more efficient than iterating with forEach. See https://bitbucket.seaonics.com/projects/CS/repos/studioapiclientlib/browse/src/studioapi/cache/eventduplicatechecker.cpp#27,43
find() now waits for the target app to appear instead of failing immediately. subscribeToStructure fires RECONNECT (2) when an app reappears after being removed, distinguishing restarts from first-time appearances.
Changes:
CDP-6069