[SEA-NodeJS] Accept-and-ignore unsupported per-statement options; drive sync execute to terminal#419
Open
msrathore-db wants to merge 1 commit into
Open
Conversation
…ve sync execute to terminal
Two correctness fixes to the SEA (kernel) backend, aligning it with the Thrift
backend and the Python connector's kernel backend.
1) Accept-and-ignore unsupported per-statement options (was: hard failure).
`SeaSessionBackend.executeStatement` threw on options the kernel does not
expose at per-statement granularity, breaking callers that set them globally.
`useCloudFetch`, `useLZ4Compression`, and `stagingAllowedLocalPath` are now
accepted and ignored (no-op), matching `KernelDatabricksClient.execute_command`
which takes the same flags and never reads them (CloudFetch is the session-level
`ResultConfig`, compression is decoded from the result manifest, and the kernel
has no Volume API yet). `queryTimeout` is also a no-op (+TODO): it must not be
mapped onto the SEA `wait_timeout` wire field (valid range {0} ∪ [5,50]s — a
different concept from a statement timeout; out-of-range values fail HTTP 400).
The correct mechanism is the `STATEMENT_TIMEOUT` session config.
2) Drive the default (sync) executeStatement to terminal before returning.
Previously the sync path executed lazily (only at first fetch), so a
side-effecting statement run via `executeStatement(...)` then `close()` with no
fetch was silently dropped, and dependent statements raced. The sync path now
awaits the statement to terminal in `executeStatement` (matching JDBC, ADBC C#,
and Python `use_kernel`), so by the time it resolves the side effect is
committed and the result is materialised — fire-and-forget DDL/DML and dependent
statements just work, and `close()` is a trivial non-blocking cleanup. Mid-run
cancellation of a long-running query is done via `runAsync: true` (submit +
poll → cancellable handle), mirroring JDBC's sync-vs-async split. This needs no
change to `SeaOperationBackend`.
Verified: 249 SEA unit tests pass; e2e on a SQL warehouse confirms fire-and-forget
+ dependent CREATE/INSERT/UPDATE/DELETE persist on SEA, mid-run cancel via
runAsync works, and the common execute/fetch/close path is at parity with Thrift
(latency front-loads from fetch into execute; total unchanged).
Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
8bb7def to
e81bf9b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two correctness fixes to the SEA (kernel) backend, aligning it with the Thrift backend and the Python connector's kernel backend (
KernelDatabricksClient). OnlySeaSessionBackendchanges;SeaOperationBackendis untouched.1. Accept-and-ignore unsupported per-statement options (was: hard failure)
SeaSessionBackend.executeStatementthrew on options the kernel does not expose at per-statement granularity, breaking callers that set them globally. They are now accepted and ignored (no-op), matchingKernelDatabricksClient.execute_command, which takes the same flags and never reads them:useCloudFetchResultConfig.cloudfetch_enabled; no per-statement override.useLZ4Compressionmanifest.result_compression); no request knob.stagingAllowedLocalPathqueryTimeoutwait_timeoutwire field (valid range{0} ∪ [5,50]s— a different concept from a statement timeout; out-of-range values fail HTTP 400). The correct mechanism is theSTATEMENT_TIMEOUTsession config.2. Drive the default (sync)
executeStatementto terminalPreviously the sync path executed lazily (only at first fetch), so a side-effecting statement run via
executeStatement(...)thenclose()with no fetch was silently dropped, and dependent statements raced (SEA doesn't serialize statements per session). The sync path now awaits the statement to terminal inexecuteStatement— matching JDBC (DatabricksSdkClientpolls to terminal in execute), ADBC C# (HiveServer2Statementpolls to terminal), and Pythonuse_kernel(blockingstmt.execute()). By the time it resolves the side effect is committed and the result is materialised, so:CREATEthenINSERTthenSELECT) just work;close()is a trivial, non-blocking cleanup (no draining, no cancel-on-close).Mid-run cancellation of a long query is done via
runAsync: true(submit → cancellable handle → poll), mirroring JDBC's sync-vs-async split. This is the one deliberate behavior change: the default sync path no longer offers mid-run cancel (you couldn't hold the handle during a blocking execute anyway).This needs no change to
SeaOperationBackend— origin/main'sclose()(blockingStatement ? close : cancel) andstatus()already do the right thing once the statement is driven to terminal inexecute.Why this design (vs draining in
close)Every other Databricks driver puts the side-effect guarantee in
executeand keepsclose()a non-blocking cancel/cleanup — none block inclose. Earlier iterations that drained inclose(blocking) or madeclosenon-blocking-but-non-draining (which broke dependent ordering) were rejected after testing. Option A matches the reference drivers and has no broken common case.Performance
Common
execute → fetch → closeis at parity with Thrift — the latency just front-loads fromfetchintoexecute; total is unchanged (SELECT 1≈ parity; 100k rows ≈ parity, verified with QRC disabled across multiple runs). No extra round-trips.Testing
queryTimeoutno-op; sync-drives-to-terminal;runAsyncreturns a pending cancellable handle).CREATE/INSERT/UPDATE/DELETEpersist on SEA with correct affected-row counts; mid-run cancel viarunAsyncinterrupts a running query; normal read path unchanged.Follow-up (kernel)
A bounded-inline-wait execute (
submitwithwait_timeout≈10sreturning a terminal-or-running handle — the JDBC/Python-native-SEA model) would let the default path stay non-blocking for fast statements and keep mid-run cancel for slow ones. A throwaway kernel patch confirmed fast statements come back terminal inline, but also that the currentsubmit/closepath (built forwait_timeout=0s) aborts an unconsumed terminal-from-submit statement — so it needs proper kernel handling (capture terminal/closed state + driver materialization), not a one-liner. Tracked as a separate kernel ask.This pull request and its description were written by Isaac.