Skip to content

[SEA-NodeJS] Accept-and-ignore unsupported per-statement options; drive sync execute to terminal#419

Open
msrathore-db wants to merge 1 commit into
mainfrom
msrathore/sea-eager-execute-and-option-noops
Open

[SEA-NodeJS] Accept-and-ignore unsupported per-statement options; drive sync execute to terminal#419
msrathore-db wants to merge 1 commit into
mainfrom
msrathore/sea-eager-execute-and-option-noops

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

@msrathore-db msrathore-db commented Jun 5, 2026

Summary

Two correctness fixes to the SEA (kernel) backend, aligning it with the Thrift backend and the Python connector's kernel backend (KernelDatabricksClient). Only SeaSessionBackend changes; SeaOperationBackend is untouched.

1. Accept-and-ignore unsupported per-statement options (was: hard failure)

SeaSessionBackend.executeStatement threw on options the kernel does not expose at per-statement granularity, breaking callers that set them globally. They are now accepted and ignored (no-op), matching KernelDatabricksClient.execute_command, which takes the same flags and never reads them:

Option Why no-op
useCloudFetch CloudFetch is governed by the session-level kernel ResultConfig.cloudfetch_enabled; no per-statement override.
useLZ4Compression The kernel decodes whatever compression the server returns (manifest.result_compression); no request knob.
stagingAllowedLocalPath The kernel has no Volume (PUT/GET/REMOVE) API yet; non-staging queries that set it run normally.
queryTimeout No-op + TODO. It must not be mapped onto the SEA wait_timeout wire field (valid range {0} ∪ [5,50]s — a different concept from a statement timeout; out-of-range values fail HTTP 400). The correct mechanism is the STATEMENT_TIMEOUT session config.

2. Drive the default (sync) executeStatement to terminal

Previously the sync path executed lazily (only at first fetch), so a side-effecting statement run via executeStatement(...) then close() with no fetch was silently dropped, and dependent statements raced (SEA doesn't serialize statements per session). The sync path now awaits the statement to terminal in executeStatement — matching JDBC (DatabricksSdkClient polls to terminal in execute), ADBC C# (HiveServer2Statement polls to terminal), and Python use_kernel (blocking stmt.execute()). By the time it resolves the side effect is committed and the result is materialised, so:

  • fire-and-forget DDL/DML just works;
  • dependent statements (CREATE then INSERT then SELECT) just work;
  • close() is a trivial, non-blocking cleanup (no draining, no cancel-on-close).

Mid-run cancellation of a long query is done via runAsync: true (submit → cancellable handle → poll), mirroring JDBC's sync-vs-async split. This is the one deliberate behavior change: the default sync path no longer offers mid-run cancel (you couldn't hold the handle during a blocking execute anyway).

This needs no change to SeaOperationBackend — origin/main's close() (blockingStatement ? close : cancel) and status() already do the right thing once the statement is driven to terminal in execute.

Why this design (vs draining in close)

Every other Databricks driver puts the side-effect guarantee in execute and keeps close() a non-blocking cancel/cleanup — none block in close. Earlier iterations that drained in close (blocking) or made close non-blocking-but-non-draining (which broke dependent ordering) were rejected after testing. Option A matches the reference drivers and has no broken common case.

Performance

Common execute → fetch → close is at parity with Thrift — the latency just front-loads from fetch into execute; total is unchanged (SELECT 1 ≈ parity; 100k rows ≈ parity, verified with QRC disabled across multiple runs). No extra round-trips.

Testing

  • Unit: 249 SEA unit tests pass (no-op option tests; queryTimeout no-op; sync-drives-to-terminal; runAsync returns a pending cancellable handle).
  • E2E (SQL warehouse, Thrift vs SEA, repeated runs): fire-and-forget + dependent CREATE/INSERT/UPDATE/DELETE persist on SEA with correct affected-row counts; mid-run cancel via runAsync interrupts a running query; normal read path unchanged.

Follow-up (kernel)

A bounded-inline-wait execute (submit with wait_timeout≈10s returning a terminal-or-running handle — the JDBC/Python-native-SEA model) would let the default path stay non-blocking for fast statements and keep mid-run cancel for slow ones. A throwaway kernel patch confirmed fast statements come back terminal inline, but also that the current submit/close path (built for wait_timeout=0s) aborts an unconsumed terminal-from-submit statement — so it needs proper kernel handling (capture terminal/closed state + driver materialization), not a one-liner. Tracked as a separate kernel ask.

This pull request and its description were written by Isaac.

…ve sync execute to terminal

Two correctness fixes to the SEA (kernel) backend, aligning it with the Thrift
backend and the Python connector's kernel backend.

1) Accept-and-ignore unsupported per-statement options (was: hard failure).
   `SeaSessionBackend.executeStatement` threw on options the kernel does not
   expose at per-statement granularity, breaking callers that set them globally.
   `useCloudFetch`, `useLZ4Compression`, and `stagingAllowedLocalPath` are now
   accepted and ignored (no-op), matching `KernelDatabricksClient.execute_command`
   which takes the same flags and never reads them (CloudFetch is the session-level
   `ResultConfig`, compression is decoded from the result manifest, and the kernel
   has no Volume API yet). `queryTimeout` is also a no-op (+TODO): it must not be
   mapped onto the SEA `wait_timeout` wire field (valid range {0} ∪ [5,50]s — a
   different concept from a statement timeout; out-of-range values fail HTTP 400).
   The correct mechanism is the `STATEMENT_TIMEOUT` session config.

2) Drive the default (sync) executeStatement to terminal before returning.
   Previously the sync path executed lazily (only at first fetch), so a
   side-effecting statement run via `executeStatement(...)` then `close()` with no
   fetch was silently dropped, and dependent statements raced. The sync path now
   awaits the statement to terminal in `executeStatement` (matching JDBC, ADBC C#,
   and Python `use_kernel`), so by the time it resolves the side effect is
   committed and the result is materialised — fire-and-forget DDL/DML and dependent
   statements just work, and `close()` is a trivial non-blocking cleanup. Mid-run
   cancellation of a long-running query is done via `runAsync: true` (submit +
   poll → cancellable handle), mirroring JDBC's sync-vs-async split. This needs no
   change to `SeaOperationBackend`.

Verified: 249 SEA unit tests pass; e2e on a SQL warehouse confirms fire-and-forget
+ dependent CREATE/INSERT/UPDATE/DELETE persist on SEA, mid-run cancel via
runAsync works, and the common execute/fetch/close path is at parity with Thrift
(latency front-loads from fetch into execute; total unchanged).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-eager-execute-and-option-noops branch from 8bb7def to e81bf9b Compare June 5, 2026 01:45
@msrathore-db msrathore-db changed the title [SEA-NodeJS] Accept-and-ignore unsupported per-statement options; eager-execute for fire-and-forget DDL/DML [SEA-NodeJS] Accept-and-ignore unsupported per-statement options; drive sync execute to terminal Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant