Skip to content

Commit 6547f19

Browse files
feat(kernel): statement query tags + http_headers/user_agent_entry on use_kernel (#823)
* feat(kernel): wire statement query tags + http_headers/user_agent on use_kernel Consume the kernel surface from kernel PR (query tags + custom HTTP headers) on the use_kernel path. Bumps KERNEL_REV. Query tags (statement-level): - execute_command no longer raises NotSupportedError for query_tags; it calls stmt.set_query_tags(query_tags) after set_sql. The connector already passes Dict[str, Optional[str]], which the kernel accepts (None value -> bare key in the SEA query_tags conf). http_headers + user_agent_entry: - The kernel client now forwards http_headers to the kernel Session as the `http_headers` kwarg (was accept-and-ignore). session.py already passes all_headers, which carries the connector's composed User-Agent (PyDatabricksSqlConnector/x (entry)) + caller headers + SPOG org-id. - The kernel applies them per request: its own Authorization / org-id win; a caller User-Agent is APPENDED to the kernel base UA (the base carries the DatabricksJDBCDriverOSS token that gates the SEA result disposition, so it's never replaced). So user_agent_entry is honored end-to-end via the existing http_headers forwarding — no separate kwarg needed. Tests: - unit: query_tags forwarded to set_query_tags (was: rejection test); http_headers forwarded to the kernel Session (and omitted when empty). - e2e (test_kernel_backend.py): a query_tagged query and a connection with user_agent_entry + a custom http_header both round-trip green against a dogfood warehouse. The UA case specifically guards the append behavior (replacing the base UA would 400 on the result disposition). KERNEL_REV -> c2053f68b75fef4a29425096dc6bbafb774d8b83 (kernel PR #119 branch HEAD; re-pin to the squash-merge SHA once #119 lands). 194 kernel unit tests pass; black + mypy clean; 3 e2e pass live. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): address review — secret-scrub ordering, drop reserved headers Addresses @gopalldb's review on #823. P1.1 — secret scrub now covers a kwarg-build failure: - open_session built auth/tls/retry/http_headers kwargs OUTSIDE the try whose finally scrubs oauth_client_secret. A raise in kernel_auth_kwargs (e.g. OAuth token-exchange failure with the M2M secret in hand) would skip the scrub, leaking the secret onto the long-lived connector. Moved all kwarg builds inside the try (auth_kwargs / tls_kwargs pre-declared empty so the finally can reference them on an early raise). P1.3 / P2 — don't forward kernel-managed headers: - The connector now drops Authorization and x-databricks-org-id from http_headers before forwarding to the kernel (new _KERNEL_MANAGED_HEADERS set). The kernel manages both (auth from the provider; org-id re-derived from ?o= in http_path), so forwarding is redundant — and the connector always injects the SPOG org-id, which the kernel skips-and-warns per request, so this also kills the WARN spam. Double-walls the kernel's own reserved-name skip. Tests: - unit: Authorization / x-databricks-org-id dropped before forwarding; only-reserved-headers omits the kwarg entirely (test_kernel_client). - P1.2: user_agent_entry reaches the kernel client's http_headers on use_kernel=True (test_session) — guards a regression where session.py stops folding the entry into the composed User-Agent. CHANGELOG: deferred to release-cut per repo convention (entries are added at version bump, not per-PR; #819/#820 followed the same). 197 kernel unit tests pass; black + mypy clean; 3 e2e pass live. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: re-pin KERNEL_REV to merged kernel main (101aa46) #119 (statement query tags + custom HTTP headers) is now merged to kernel main. Re-pin from the orphaned branch HEAD (c2053f6) to the current merged main HEAD (101aa46), which contains #119's merge commit (df8302f) and is reachable from main — no orphan-SHA risk. Verified end-to-end against a wheel built from 101aa46: kernel e2e (query_tags_round_trip, user_agent_entry + http_headers round_trip, select_one) all pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent 5292fc1 commit 6547f19

5 files changed

Lines changed: 246 additions & 35 deletions

File tree

KERNEL_REV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
b4d88220cdfad8dba1cfa89892269342ae26feeb
1+
101aa465e71991eec98102bba77aad2f7ad8faed

src/databricks/sql/backend/kernel/client.py

Lines changed: 63 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,14 @@
6363

6464
logger = logging.getLogger(__name__)
6565

66+
# Headers the kernel manages itself and that the connector must NOT
67+
# forward via ``http_headers`` (lower-cased for case-insensitive match):
68+
# ``authorization`` (the kernel applies the auth provider's token) and
69+
# ``x-databricks-org-id`` (the kernel re-derives it from the ``?o=`` in
70+
# http_path). Forwarding either is redundant and trips the kernel's
71+
# per-request skip-and-warn.
72+
_KERNEL_MANAGED_HEADERS = frozenset({"authorization", "x-databricks-org-id"})
73+
6674

6775
# ─── Client ─────────────────────────────────────────────────────────────────
6876

@@ -91,13 +99,19 @@ def __init__(
9199
):
92100
# ``ssl_options`` is translated to the kernel's ``tls_*``
93101
# Session kwargs in ``open_session`` (custom CA, verify
94-
# toggles, mTLS client cert/key). ``http_headers`` /
95-
# ``http_client`` / ``port`` are still accept-and-ignore — the
96-
# kernel manages its own HTTP stack.
102+
# toggles, mTLS client cert/key). ``http_headers`` is forwarded
103+
# to the kernel as custom request headers (it carries the
104+
# connector's composed ``User-Agent`` + any caller headers + the
105+
# SPOG ``x-databricks-org-id``). ``http_client`` / ``port`` are
106+
# still accept-and-ignore — the kernel manages its own HTTP
107+
# stack.
97108
self._server_hostname = server_hostname
98109
self._http_path = http_path
99110
self._auth_provider = auth_provider
100111
self._ssl_options = ssl_options
112+
# Caller / connector HTTP headers (list of (name, value) pairs).
113+
# Forwarded to the kernel Session in ``open_session``.
114+
self._http_headers = http_headers or []
101115
# Raw auth-relevant connect() kwargs (auth_type,
102116
# oauth_client_id/secret, redirect port, credentials_provider).
103117
# The kernel auth bridge needs these to build OAuth kwargs — the
@@ -175,19 +189,45 @@ def open_session(
175189
session_conf: Optional[Dict[str, str]] = None
176190
if session_configuration:
177191
session_conf = {k: str(v) for k, v in session_configuration.items()}
178-
# Build auth kwargs here (not in ``__init__``) so the bearer
179-
# token has the shortest possible in-process lifetime: a
180-
# local kwargs dict is GC-eligible the moment this method
181-
# returns, regardless of whether the kernel ``Session()``
182-
# call succeeded or raised.
183-
auth_kwargs = kernel_auth_kwargs(self._auth_provider, self._auth_options)
184-
# Translate the connector's SSLOptions into the kernel's
185-
# ``tls_*`` Session kwargs. Empty when TLS is left at defaults.
186-
tls_kwargs = _kernel_tls_kwargs(self._ssl_options)
187-
# Translate the connector's ``_retry_*`` kwargs into the kernel's
188-
# ``retry_*`` Session kwargs. Empty when retry is left at defaults.
189-
retry_kwargs = _kernel_retry_kwargs(self._retry_options)
192+
# The kwarg builds run INSIDE the try so the ``finally`` scrub
193+
# below always fires — including when ``kernel_auth_kwargs``
194+
# itself raises mid-build (e.g. an OAuth token-exchange failure
195+
# while the M2M secret is in hand). Pre-declared empty so the
196+
# ``finally`` can reference them unconditionally even on an early
197+
# raise. Building here (not in ``__init__``) keeps the bearer
198+
# token's in-process lifetime as short as possible.
199+
auth_kwargs: Dict[str, Any] = {}
200+
tls_kwargs: Dict[str, Any] = {}
190201
try:
202+
auth_kwargs = kernel_auth_kwargs(self._auth_provider, self._auth_options)
203+
# Translate the connector's SSLOptions into the kernel's
204+
# ``tls_*`` Session kwargs. Empty when TLS is at defaults.
205+
tls_kwargs = _kernel_tls_kwargs(self._ssl_options)
206+
# Translate the connector's ``_retry_*`` kwargs into the
207+
# kernel's ``retry_*`` kwargs. Empty when at defaults.
208+
retry_kwargs = _kernel_retry_kwargs(self._retry_options)
209+
# Forward caller / connector HTTP headers. The kernel applies
210+
# them on every request; a caller ``User-Agent`` is appended
211+
# to the kernel's base UA. Only pass the kwarg when there's
212+
# something to send.
213+
#
214+
# We drop ``Authorization`` and ``x-databricks-org-id`` here,
215+
# before they reach the kernel, for two reasons: (1) the
216+
# kernel manages both itself (auth from the provider; org-id
217+
# re-derived from the ``?o=`` in http_path), so forwarding
218+
# them is redundant; (2) the kernel skips-and-warns those two
219+
# names on every request, so forwarding the SPOG org-id the
220+
# connector always injects would spam a warning per request.
221+
# This double-walls the kernel's own reserved-name skip.
222+
http_headers_kwargs: Dict[str, Any] = {}
223+
if self._http_headers:
224+
forwarded = [
225+
(str(k), str(v))
226+
for k, v in self._http_headers
227+
if str(k).lower() not in _KERNEL_MANAGED_HEADERS
228+
]
229+
if forwarded:
230+
http_headers_kwargs["http_headers"] = forwarded
191231
self._kernel_session = _kernel.Session(
192232
host=self._server_hostname,
193233
http_path=self._http_path,
@@ -208,6 +248,7 @@ def open_session(
208248
**auth_kwargs,
209249
**tls_kwargs,
210250
**retry_kwargs,
251+
**http_headers_kwargs,
211252
)
212253
except Exception as exc:
213254
raise _wrap_kernel_exception("open_session", exc) from exc
@@ -304,10 +345,6 @@ def execute_command(
304345
) -> Union["ResultSet", None]:
305346
if self._kernel_session is None:
306347
raise InterfaceError("Cannot execute_command without an open session.")
307-
if query_tags:
308-
raise NotSupportedError(
309-
"Statement-level query_tags are not yet supported on the kernel backend."
310-
)
311348

312349
try:
313350
stmt = self._kernel_session.statement()
@@ -321,6 +358,13 @@ def execute_command(
321358
try:
322359
try:
323360
stmt.set_sql(operation)
361+
if query_tags:
362+
# Per-statement query tags. The kernel serialises the
363+
# dict (None value -> bare key) into the SEA
364+
# `query_tags` statement conf. ``query_tags`` is
365+
# already ``Dict[str, Optional[str]]`` from the
366+
# connector, which the kernel accepts directly.
367+
stmt.set_query_tags(query_tags)
324368
if parameters:
325369
bind_tspark_params(stmt, parameters)
326370
if async_op:

tests/e2e/test_kernel_backend.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,3 +332,32 @@ def test_parameterized_query_decimal(conn):
332332
rows = cur.fetchall()
333333
# Server echoes back as decimal.Decimal.
334334
assert str(rows[0][0]) == "-123.45"
335+
336+
337+
def test_query_tags_round_trip(kernel_conn_params):
338+
"""Per-statement query_tags are forwarded to the kernel and accepted
339+
by the server. Smoke-level: a malformed query_tags conf would fail
340+
the execute. (query.history ingestion lag makes a sync tag-readback
341+
assertion infeasible.)"""
342+
with sql.connect(**kernel_conn_params) as c:
343+
with c.cursor() as cur:
344+
cur.execute(
345+
"SELECT 1 AS n",
346+
query_tags={"team": "platform", "production": None},
347+
)
348+
assert cur.fetchall()[0][0] == 1
349+
350+
351+
def test_user_agent_entry_and_http_headers_round_trip(kernel_conn_params):
352+
"""A connection with user_agent_entry (folded into the connector's
353+
User-Agent, then appended to the kernel base UA) and a custom HTTP
354+
header opens and queries cleanly. Replacing the kernel base UA would
355+
break the SEA result disposition (HTTP 400); appending preserves it
356+
— this exercises that end-to-end."""
357+
params = dict(kernel_conn_params)
358+
params["user_agent_entry"] = "kernel-e2e-app"
359+
params["http_headers"] = [("X-Kernel-E2E", "yes")]
360+
with sql.connect(**params) as c:
361+
with c.cursor() as cur:
362+
cur.execute("SELECT 1 AS n")
363+
assert cur.fetchall()[0][0] == 1

tests/unit/test_kernel_client.py

Lines changed: 104 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -332,26 +332,43 @@ def test_execute_command_forwards_parameters_to_bind_param():
332332
assert stmt.execute.called
333333

334334

335-
def test_execute_command_rejects_query_tags():
335+
def test_execute_command_forwards_query_tags():
336+
"""Statement-level query_tags are forwarded to the kernel statement
337+
via set_query_tags (the kernel serialises them into the SEA
338+
query_tags conf). Previously rejected with NotSupportedError; now
339+
wired (kernel PR adding Statement.set_query_tags)."""
336340
c = _make_client()
337341
c._kernel_session = MagicMock()
338342
cursor = MagicMock()
339343
cursor.arraysize = 100
340344
cursor.buffer_size_bytes = 1024
341-
with pytest.raises(NotSupportedError, match="query_tags"):
342-
c.execute_command(
343-
operation="SELECT 1",
344-
session_id=MagicMock(),
345-
max_rows=1,
346-
max_bytes=1,
347-
lz4_compression=False,
348-
cursor=cursor,
349-
use_cloud_fetch=False,
350-
parameters=[],
351-
async_op=False,
352-
enforce_embedded_schema_correctness=False,
353-
query_tags={"team": "x"},
354-
)
345+
346+
stmt = MagicMock()
347+
stmt.set_sql = MagicMock()
348+
stmt.set_query_tags = MagicMock()
349+
stmt.execute.return_value = MagicMock(
350+
statement_id="stmt-id",
351+
arrow_schema=MagicMock(return_value=pa.schema([("x", pa.int64())])),
352+
)
353+
c._kernel_session.statement.return_value = stmt
354+
355+
tags = {"team": "platform", "production": None}
356+
c.execute_command(
357+
operation="SELECT 1",
358+
session_id=MagicMock(),
359+
max_rows=1,
360+
max_bytes=1,
361+
lz4_compression=False,
362+
cursor=cursor,
363+
use_cloud_fetch=False,
364+
parameters=[],
365+
async_op=False,
366+
enforce_embedded_schema_correctness=False,
367+
query_tags=tags,
368+
)
369+
370+
stmt.set_query_tags.assert_called_once_with(tags)
371+
assert stmt.execute.called
355372

356373

357374
def test_get_columns_accepts_none_catalog():
@@ -1015,3 +1032,75 @@ def test_retry_delay_default_has_no_mapping(self):
10151032
# recognised key here — it has no kernel equivalent.
10161033
out = kernel_client._kernel_retry_kwargs({"retry_delay_default": 5.0})
10171034
assert out == {}
1035+
1036+
1037+
class TestKernelHttpHeadersForwarding:
1038+
"""http_headers (the connector's caller headers + composed
1039+
User-Agent + SPOG org-id) are forwarded to the kernel Session as the
1040+
``http_headers`` kwarg. The kernel applies them per request (its own
1041+
Authorization / org-id win; a caller User-Agent is appended to the
1042+
kernel base UA)."""
1043+
1044+
def _open_capturing(self, monkeypatch, http_headers):
1045+
captured = {}
1046+
1047+
def fake_session(**kw):
1048+
captured.update(kw)
1049+
sess = MagicMock()
1050+
sess.session_id = "sess-id"
1051+
return sess
1052+
1053+
monkeypatch.setattr(kernel_client._kernel, "Session", fake_session)
1054+
c = kernel_client.KernelDatabricksClient(
1055+
server_hostname="example.cloud.databricks.com",
1056+
http_path="/sql/1.0/warehouses/abc",
1057+
auth_provider=AccessTokenAuthProvider("dapi-test"),
1058+
ssl_options=None,
1059+
http_headers=http_headers,
1060+
)
1061+
c.open_session(session_configuration=None, catalog=None, schema=None)
1062+
return captured
1063+
1064+
def test_http_headers_forwarded_to_kernel_session(self, monkeypatch):
1065+
headers = [
1066+
("User-Agent", "PyDatabricksSqlConnector/4.0 (myentry)"),
1067+
("X-Custom", "v1"),
1068+
]
1069+
captured = self._open_capturing(monkeypatch, headers)
1070+
assert captured.get("http_headers") == [
1071+
("User-Agent", "PyDatabricksSqlConnector/4.0 (myentry)"),
1072+
("X-Custom", "v1"),
1073+
]
1074+
1075+
def test_no_http_headers_omits_kwarg(self, monkeypatch):
1076+
# Empty/none headers → the kwarg isn't passed at all (kernel
1077+
# keeps its defaults).
1078+
captured = self._open_capturing(monkeypatch, [])
1079+
assert "http_headers" not in captured
1080+
1081+
def test_authorization_and_org_id_dropped_before_forwarding(self, monkeypatch):
1082+
# The connector must NOT forward Authorization / x-databricks-org-id
1083+
# to the kernel — the kernel manages both (and warns per request
1084+
# if it sees them). Double-walls the kernel's own skip.
1085+
headers = [
1086+
("Authorization", "Bearer should-not-forward"),
1087+
("X-Databricks-Org-Id", "12345"),
1088+
("User-Agent", "PyDatabricksSqlConnector/4.0 (e)"),
1089+
("X-Keep", "yes"),
1090+
]
1091+
captured = self._open_capturing(monkeypatch, headers)
1092+
fwd = captured.get("http_headers")
1093+
names = {n.lower() for n, _ in fwd}
1094+
assert "authorization" not in names
1095+
assert "x-databricks-org-id" not in names
1096+
# Non-reserved headers (incl. User-Agent) still forwarded.
1097+
assert ("User-Agent", "PyDatabricksSqlConnector/4.0 (e)") in fwd
1098+
assert ("X-Keep", "yes") in fwd
1099+
1100+
def test_only_reserved_headers_omits_kwarg(self, monkeypatch):
1101+
# If the only headers are reserved ones, nothing is forwarded
1102+
# and the kwarg is omitted entirely.
1103+
captured = self._open_capturing(
1104+
monkeypatch, [("Authorization", "Bearer x"), ("x-databricks-org-id", "1")]
1105+
)
1106+
assert "http_headers" not in captured

tests/unit/test_session.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,3 +476,52 @@ def test_retry_kwargs_threaded_into_kernel_client(self):
476476
assert opts["retry_stop_after_attempts_duration"] == 600.0
477477
finally:
478478
conn.close()
479+
480+
481+
class TestKernelUserAgentForwarding:
482+
"""user_agent_entry must reach the kernel on the use_kernel path —
483+
session.py folds it into the composed User-Agent and includes it in
484+
all_headers, which is passed to the kernel client as http_headers.
485+
Guards against a regression where session.py stops folding it under
486+
use_kernel=True (which would silently drop partner attribution)."""
487+
488+
PACKAGE = "databricks.sql"
489+
490+
def test_user_agent_entry_reaches_kernel_client_http_headers(self):
491+
import sys
492+
import types
493+
494+
pytest.importorskip(
495+
"pyarrow", reason="kernel client module imports pyarrow at load"
496+
)
497+
498+
fake = types.ModuleType("databricks_sql_kernel")
499+
fake.KernelError = type("KernelError", (Exception,), {})
500+
fake.Session = MagicMock()
501+
502+
with patch.dict(sys.modules, {"databricks_sql_kernel": fake}), patch(
503+
"databricks.sql.backend.kernel.client.KernelDatabricksClient"
504+
) as mock_kernel_client, patch(
505+
"%s.session.get_python_sql_connector_auth_provider" % self.PACKAGE
506+
):
507+
instance = mock_kernel_client.return_value
508+
instance.open_session.return_value = SessionId(
509+
BackendType.SEA, "sess-id", None
510+
)
511+
512+
conn = databricks.sql.connect(
513+
server_hostname="foo",
514+
http_path="/sql/1.0/warehouses/abc",
515+
use_kernel=True,
516+
access_token="dapi-xyz",
517+
enable_telemetry=False,
518+
user_agent_entry="my-partner-app",
519+
)
520+
try:
521+
_, kwargs = mock_kernel_client.call_args
522+
# http_headers carries a User-Agent that embeds the entry.
523+
headers = dict(kwargs["http_headers"])
524+
ua = headers.get("User-Agent", "")
525+
assert "my-partner-app" in ua, f"UA was {ua!r}"
526+
finally:
527+
conn.close()

0 commit comments

Comments
 (0)