Skip to content

Implement find() wait and RECONNECT structure event#24

Open
stefanrammo wants to merge 1 commit intomainfrom
impl/add-find-wait-and-reconnect-event
Open

Implement find() wait and RECONNECT structure event#24
stefanrammo wants to merge 1 commit intomainfrom
impl/add-find-wait-and-reconnect-event

Conversation

@stefanrammo
Copy link
Copy Markdown
Collaborator

find() now waits for the target app to appear instead of failing immediately. subscribeToStructure fires RECONNECT (2) when an app reappears after being removed, distinguishing restarts from first-time appearances.

Changes:

  • find() waits indefinitely by default; { timeout: N } limits the wait; { timeout: 0 } preserves the old immediate-fail behavior
  • find() works without prior root() call (triggers connection internally)
  • close() is terminal: sets isClosed flag, rejects pending find() waiters, prevents reconnection after close
  • RECONNECT constant (2) added alongside ADD (1) and REMOVE (0)
  • Per-app lifecycle via announceApp/unannounceApp: each app is tracked independently, fixing the bug where one sibling disconnecting lost all sibling connections
  • Subscription keepalive via inactivityResendInterval field (client requests server resend values/events after 120s of no changes, matching C++ client ServicesProtocol::InactivityResendIntervalSec)
  • Value dedup via lastServerTimestamp and event dedup via recentEventIds to handle server replay after reconnection
  • Exponential backoff with jitter for reconnection (1s to 30s max)
  • Stall detection: force reconnect after 150s of server silence with active subscriptions (must exceed 120s keepalive interval)
  • Direct mode sibling discovery uses server-pushed eStructureChangeResponse instead of client-side polling
  • CDP_FORCE_DIRECT_MODE=1 env var to force direct mode on proxy-capable servers (for testing)
  • Cache key fix: use join('.') instead of toString() for correct invalidateApp prefix matching on app structure changes
  • Remove unused serviceId storage on proxy connections
  • Fix instanceId serialization: pass through as-is instead of defaulting falsy values to 0
  • appAddress() helper for DRY server address construction
  • README updated with Structure Events section and find() options
  • Version bump to 3.0.0 (breaking: find() default changed from immediate fail to indefinite wait)

CDP-6069

@stefanrammo stefanrammo force-pushed the impl/add-find-wait-and-reconnect-event branch 2 times, most recently from 0d5d0ef to 53d5a07 Compare March 24, 2026 14:04
@stefanrammo stefanrammo requested review from Karmo7 and nuubik March 24, 2026 14:37

- ``studio.api.structure.ADD`` (1) — An application appeared for the first time
- ``studio.api.structure.REMOVE`` (0) — An application went offline
- ``studio.api.structure.RECONNECT`` (2) — An application restarted (was seen before, went offline, came back)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc makes it look like subscribeToStructure is only meant to detect application starts and stops but this is only one of the use cases. It can also detect when nodes (e.g. CDPOperators) are added and deleted during runtime

README.rst Outdated
path - Path of the object to look for.
path - Dot-separated path to target node (e.g. ``'App2.CPULoad'``).

options - Optional object. ``{ timeout: milliseconds }`` to limit wait time.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention right here that default is infinite wait

README.rst Outdated
- Behavior

The requested node must reside in the application client was connected to.
Waits indefinitely for the target application to appear. If the application is already
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add that it only applies when timeout arg is missing

if (nodeTimestamp !== undefined) {
var ts = Number(nodeTimestamp);
if (ts > 0) {
if (lastServerTimestamp !== null && ts < lastServerTimestamp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that CDP is multi-threaded, is it even guaranteed that values from different components with different scheduling groups, arrive with increasing timestamps? Does @nuubik or @martlaak know? Otherwise, this code might miss value changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to prevent duplicate notification, maybe could just check if the received value equals node's current value? Unlike events, there is no server-side history for value changes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplication is per node. Added component tests to verify: two sines in different scheduling group threads and a concurrent setValue on a routed signal.
Tested value equality and had many test failures since the server sends values at the subscribed sampling rate regardless of value changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Tested value equality and had many test failures since the server sends values at the subscribed sampling rate regardless of value changes." - if this is true, it sounds like a server-side bug. StudioAPI protocol should be optimized to only send values when they actually change. Maybe you should report an issue to Jira.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, not a server bug. setValue sets lastValue locally before the server confirms, and cached nodes retain it across re-subscribe. Value equality would suppress the initial delivery when it matches. Timestamp-based handling avoids that. The test failures were from this, not from the server sending unchanged values periodically.

// Trim when map grows large (30s sliding window in nanoseconds)
if (recentEventIds.size > 200 && ts > 0) {
var cutoff = ts - 30e9;
recentEventIds.forEach(function(storedTs, evId) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potential performance issue if for every event, it will iterate over all events in the map. Should try the code against a system with 100 000 events.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the C++ client had several iterations to get this right and eventually settled on a std::multimap which is a tree-based structure that keeps its items sorted by timestamp and that allows it to do binary search (with equal_range and lower_bound) which is more efficient than iterating with forEach. See https://bitbucket.seaonics.com/projects/CS/repos/studioapiclientlib/browse/src/studioapi/cache/eventduplicatechecker.cpp#27,43

var hasNotifiedDisconnect = false; // Guard for onDisconnected lifecycle callback
var hasConnectedBefore = false; // Distinguishes initial connect from reconnect
var SERVICES_TIMEOUT_MS = (INACTIVITY_RESEND_INTERVAL_S + 30) * 1000;
var reconnectDelayMs = INITIAL_RECONNECT_DELAY_MS;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some variables UPPER_CASE and some camelCase? Should check which style is more common in JS and use that.


// Stall detection: force-close socket if no server messages for STALL_TIMEOUT_MS
// while there are active subscriptions expecting data. Without subscriptions,
// silence is expected and not a stall.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ client, there is always timesync protocol running, so there is always someone expecting data and keeping the connection alive.

index.js Outdated

// Stall detection runs only on the primary connection. In direct mode, sibling
// stalls are detected via REMOVE events when the primary's structure updates.
// In proxy mode, sibling traffic flows through the primary, so one timer covers all.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still detect stalls to sibling connections, I think. The primary does not verify the messages it proxies actually reach the siblings. So, in case of networking issues, some connections might still stall.

In both direct and proxy mode, the primary will notify of sibling app start/stop via structure invalidation on root node (structure_change_response). And in both modes, connections can stall if there are networking issues. The only difference is that in proxy mode there will be a services_notification in addition to structure_change_response

index.js Outdated
studio.api = (function(internal) {
var obj = {};
// No default timeout — find() waits indefinitely for the app to appear.
// Use { timeout: 30000 } for explicit timeout, { timeout: 0 } for immediate fail.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that comment in this place here?

index.js Outdated
// In direct mode, reject if the app was previously connected but is now
// disconnected. This prevents returning stale nodes whose connection is down.
// In proxy mode, the node tree is maintained by the primary connection so
// stale access is not an issue — the primary connection handles structure.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any difference between direct and proxy connections here. Proxy is just a websocket proxy, it doesn't maintain anything in StudioAPI protocol level.

find() now waits for the target app to appear instead of failing
immediately. subscribeToStructure fires RECONNECT (2) when an app
reappears after being removed, distinguishing restarts from first-time
appearances.

Changes:
- find() waits indefinitely by default; { timeout: N } limits the wait;
  { timeout: 0 } preserves the old immediate-fail behavior
- find() works without prior root() call (triggers connection internally)
- close() is terminal: sets isClosed flag, rejects pending find() waiters,
  prevents reconnection after close
- RECONNECT constant (2) added alongside ADD (1) and REMOVE (0)
- Per-app lifecycle via announceApp/unannounceApp: each app is tracked
  independently, fixing the bug where one sibling disconnecting lost
  all sibling connections
- Subscription keepalive via inactivityResendInterval field (client
  requests server resend values/events after 120s of no changes,
  matching C++ client ServicesProtocol::InactivityResendIntervalSec)
- Value dedup via lastServerTimestamp and event dedup via recentEventIds
  to handle server replay after reconnection
- Exponential backoff with jitter for reconnection (1s to 30s max)
- Stall detection: force reconnect after 150s of server silence with
  active subscriptions (must exceed 120s keepalive interval)
- Direct mode sibling discovery uses server-pushed
  eStructureChangeResponse instead of client-side polling
- CDP_FORCE_DIRECT_MODE=1 env var to force direct mode on proxy-capable
  servers (for testing)
- Cache key fix: use join('.') instead of toString() for correct
  invalidateApp prefix matching on app structure changes
- Remove unused serviceId storage on proxy connections
- Fix instanceId serialization: pass through as-is instead of defaulting
  falsy values to 0
- appAddress() helper for DRY server address construction
- README updated with Structure Events section and find() options
- Version bump to 3.0.0 (breaking: find() default changed from
  immediate fail to indefinite wait)

CDP-6069
@stefanrammo stefanrammo force-pushed the impl/add-find-wait-and-reconnect-event branch from 53d5a07 to fd47e9f Compare March 30, 2026 12:54
// Trim when map grows large (30s sliding window in nanoseconds)
if (recentEventIds.size > 200 && ts > 0) {
var cutoff = ts - 30e9;
recentEventIds.forEach(function(storedTs, evId) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the C++ client had several iterations to get this right and eventually settled on a std::multimap which is a tree-based structure that keeps its items sorted by timestamp and that allows it to do binary search (with equal_range and lower_bound) which is more efficient than iterating with forEach. See https://bitbucket.seaonics.com/projects/CS/repos/studioapiclientlib/browse/src/studioapi/cache/eventduplicatechecker.cpp#27,43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants