feat: Run leading USE WAREHOUSE / SET / ALTER SESSION on the same connection as the main query#96
feat: Run leading USE WAREHOUSE / SET / ALTER SESSION on the same connection as the main query#96robertlacok wants to merge 5 commits intomainfrom
Conversation
…-51) Cache the SQLAlchemy engine per IPython cell execution so that two back-to-back `_dntk.execute_sql*` calls in the same generated cell share the same physical DBAPI connection. This lets Snowflake session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...` etc. persist onto the next query, which unblocks supporting `USE WAREHOUSE abc; SELECT 123` in Snowflake SQL blocks. Cached engines are created with `pool_size=1, max_overflow=0` so `engine.begin()` always checks out the same connection. Disposed via a `post_run_cell` IPython hook. Caching is skipped for SSH-tunneled engines, user-supplied custom pool config, and when no IPython shell is available (script/CLI use) — all of those fall back to the previous create-then-dispose-per-call behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds extraction and execution of leading session-setup statements. A new parser extracts allowlisted setup statements from the start of a rendered query. Extracted statements are combined with any user-supplied Sequence Diagram(s)sequenceDiagram
participant Client
participant Parser as SetupStatementParser
participant Service as SQLExecutor
participant Engine as DBEngine
Client->>Service: submit SQL (+ optional setup_statements)
Service->>Parser: extract_setup_statements(rendered_query)
Parser-->>Service: (extracted_setup_statements, remaining_query)
Service->>Service: combine extracted + provided -> final_setup_statements
Service->>Engine: open connection
alt SQLAlchemy-backed engine
Service->>Engine: exec_driver_sql(setup stmt 1)
Engine-->>Service: OK / Error
Service->>Engine: exec_driver_sql(setup stmt N)
Engine-->>Service: OK / Error
Service->>Engine: pandas.read_sql_query(main query)
Engine-->>Service: result rows
else DuckDB path
Service->>Engine: execute_duckdb_sql(setup stmt 1)
Engine-->>Service: OK / Error
Service->>Engine: execute_duckdb_sql(setup stmt N)
Engine-->>Service: OK / Error
Service->>Engine: execute_duckdb_sql(main query)
Engine-->>Service: result rows
end
Service-->>Client: results or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
📦 Python package built successfully!
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #96 +/- ##
==========================================
+ Coverage 74.32% 74.75% +0.43%
==========================================
Files 94 95 +1
Lines 5535 5681 +146
Branches 824 866 +42
==========================================
+ Hits 4114 4247 +133
- Misses 1155 1158 +3
- Partials 266 276 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepnote_toolkit/ipython_utils.py (1)
6-34: 🛠️ Refactor suggestion | 🟠 MajorType the hook registry and callback explicitly.
Lines 6-18 add a new hook API, but both
_registered_post_run_cell_callbacksandcallbackare untyped. That weakens static checking around event registration.Suggested fix
+from collections.abc import Callable +from typing import Any + -_registered_post_run_cell_callbacks: set = set() +_registered_post_run_cell_callbacks: set[Callable[..., Any]] = set() - -def register_post_run_cell_hook(callback) -> bool: +def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool:As per coding guidelines, "Use explicit type hints for function parameters and return values" and "Use type hints consistently".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/ipython_utils.py` around lines 6 - 34, Add explicit type hints for the hook registry and callback: import typing symbols (Set, Callable, Any) and change the module-level variable to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update the register_post_run_cell_hook signature to def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static type checkers know the stored callable shape when registering with ip.events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 465-485: The cache key in _compute_engine_cache_key currently uses
only ssh_options, pool-related params, integration_id, or url which ignores
other engine-defining options; update it to incorporate a deterministic
serialization of the relevant connection params (e.g., the params dict and any
connect_args/auth headers present in sql_alchemy_dict) into the returned key
while preserving the existing early returns for ssh_options-enabled and explicit
pool config; specifically, when integration_id exists include it plus a stable
encoding (sorted/JSON) of params/connect_args in the key (and if no
integration_id, include the URL plus the same stable encoding) so that differing
params produce distinct cache keys.
---
Outside diff comments:
In `@deepnote_toolkit/ipython_utils.py`:
- Around line 6-34: Add explicit type hints for the hook registry and callback:
import typing symbols (Set, Callable, Any) and change the module-level variable
to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update
the register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b6e1c41b-ba25-4588-aa46-4a16f4340388
📒 Files selected for processing (4)
deepnote_toolkit/ipython_utils.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_ipython_utils.pytests/unit/test_sql_execution_internal.py
| def _compute_engine_cache_key(sql_alchemy_dict) -> Optional[str]: | ||
| """Stable key identifying the engine for *sql_alchemy_dict*, or None when | ||
| this connection should not be cached. | ||
|
|
||
| SSH-tunneled engines can't be cached because the tunnel only lives inside | ||
| `_create_sql_ssh_uri`'s context manager. Connections that already specify | ||
| a custom pool config in user `params` are also skipped to avoid clashing | ||
| with the `pool_size=1` we set on cached engines. | ||
| """ | ||
| if sql_alchemy_dict.get("ssh_options", {}).get("enabled"): | ||
| return None | ||
|
|
||
| params = sql_alchemy_dict.get("params") or {} | ||
| if "pool_size" in params or "max_overflow" in params or "poolclass" in params: | ||
| return None | ||
|
|
||
| integration_id = sql_alchemy_dict.get("integration_id") | ||
| if integration_id: | ||
| return f"integration:{integration_id}" | ||
|
|
||
| return f"url:{sql_alchemy_dict['url']}" |
There was a problem hiding this comment.
Include engine-defining params in the cache key.
Line 481 collapses the key to integration_id, and Line 485 falls back to just the raw URL. create_engine() also depends on params, so two calls in one cell can incorrectly share an engine even when connect_args, auth headers, or other connection options differ. That can silently run the second query with the wrong connection/session configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 465 - 485, The cache key
in _compute_engine_cache_key currently uses only ssh_options, pool-related
params, integration_id, or url which ignores other engine-defining options;
update it to incorporate a deterministic serialization of the relevant
connection params (e.g., the params dict and any connect_args/auth headers
present in sql_alchemy_dict) into the returned key while preserving the existing
early returns for ssh_options-enabled and explicit pool config; specifically,
when integration_id exists include it plus a stable encoding (sorted/JSON) of
params/connect_args in the key (and if no integration_id, include the URL plus
the same stable encoding) so that differing params produce distinct cache keys.
|
🚀 Review App Deployment Started
|
Drops the implicit cache + IPython post_run_cell hook in favor of an explicit `_dntk.sql_session()` context manager. Inside the with-block each unique connection (keyed by integration_id, or URL when not present) opens its SSH tunnel and SQLAlchemy engine once and reuses both for every `execute_sql*` call inside the block. On exit the session disposes the engine and closes the tunnel together. This makes SSH tunnels work with SAL-51 reuse — the previous cache had to skip SSH because the tunnel's lifetime was bound to a single call by `_create_sql_ssh_uri`'s context manager. With the session owning both, the tunnel's lifetime now matches the engine's. Other gains: - No IPython coupling. Lifetime is whatever the caller (deepnote- internal codegen) wraps in the `with` block. - No "magic" cache key vs. live state divergence — outside a session behavior is exactly the previous per-call create+dispose. - Nested `sql_session()` blocks no-op on the inner block so the outer retains ownership. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)
475-488:⚠️ Potential issue | 🟠 MajorExpand the session cache key beyond
integration_id/ URL.
_session_resource_key()still aliases every call for the same integration or raw URL. Inside onesql_session(), that lets a later call reuse an engine and SSH tunnel created with differentparams, auth headers, or SSH settings.Possible fix
def _session_resource_key(sql_alchemy_dict) -> Optional[str]: params = sql_alchemy_dict.get("params") or {} if "pool_size" in params or "max_overflow" in params or "poolclass" in params: return None - - integration_id = sql_alchemy_dict.get("integration_id") - if integration_id: - return f"integration:{integration_id}" - - return f"url:{sql_alchemy_dict['url']}" + key_payload = { + "integration_id": sql_alchemy_dict.get("integration_id"), + "url": sql_alchemy_dict["url"], + "params": params, + "ssh_options": sql_alchemy_dict.get("ssh_options"), + } + return json.dumps(key_payload, sort_keys=True, default=str)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 475 - 488, _session_resource_key currently returns the same key for all connections sharing an integration_id or URL, allowing reuse of engines/tunnels with different connection-specific settings; change _session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the connection-specific attributes (e.g., serialize and include params, auth headers, and SSH settings such as sql_alchemy_dict.get("params"), sql_alchemy_dict.get("headers") or auth token field, and sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but keep the existing early return None for explicit pool_size/max_overflow/poolclass). Use a deterministic serialization or short hash of those fields to build the returned string (e.g., f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so engines/tunnels are only reused when all relevant settings match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 365-399: Change _open_ssh_tunnel and _close_ssh_tunnel to have
explicit type hints and raise a specific exception type instead of a bare
Exception: add a small custom exception class (e.g., SSHTunnelError or
SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or
create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return
Tuple[Any, URL] (or the concrete tunnel type if available) and accept
sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server:
Optional[Any]) -> None; update docstrings to state ownership/closing contract
and failure modes; keep references to the existing symbols (_open_ssh_tunnel,
_close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers
can find and validate the changes.
- Around line 561-572: The call to create_engine currently passes two kwarg
dicts separately which can cause a TypeError if sql_alchemy_dict["params"]
contains overlapping keys like pool_pre_ping; before calling create_engine (in
the block using suppress_third_party_deprecation_warnings and the in_session
logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single
dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update
carefully) so extra_engine_params values only apply when keys are absent, then
call create_engine(url, **merged_params) instead of passing two dicts.
In `@tests/unit/test_sql_execution_internal.py`:
- Around line 380-386: Update the helper _make_sql_alchemy_dict to add explicit
type hints: annotate parameters url and params as Optional[str] and
Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and
declare the function return type as Dict[str, Any] (or a more specific mapping
type used in tests). Adjust the function signature and any necessary imports
(from typing import Optional, Dict, Any) so the function reads e.g. def
_make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str]
= None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].
---
Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 475-488: _session_resource_key currently returns the same key for
all connections sharing an integration_id or URL, allowing reuse of
engines/tunnels with different connection-specific settings; change
_session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the
connection-specific attributes (e.g., serialize and include params, auth
headers, and SSH settings such as sql_alchemy_dict.get("params"),
sql_alchemy_dict.get("headers") or auth token field, and
sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but
keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22352dcb-ac84-4524-b98f-c797b6e3f455
📒 Files selected for processing (3)
deepnote_toolkit/__init__.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_sql_execution_internal.py
| def _open_ssh_tunnel(sql_alchemy_dict) -> tuple[Any, Any]: | ||
| """Open an SSH tunnel for *sql_alchemy_dict* and return ``(server, url)``. | ||
|
|
||
| Caller is responsible for eventually closing the returned server with | ||
| ``_close_ssh_tunnel``. | ||
| """ | ||
| base64_encoded_key = dnenv.get_env("PRIVATE_SSH_KEY_BLOB") | ||
| if not base64_encoded_key: | ||
| raise Exception( | ||
| "The private key needed to establish the SSH connection is missing. Please try again or contact support." | ||
| ) | ||
| original_url = make_url(sql_alchemy_dict["url"]) | ||
| server = create_ssh_tunnel( | ||
| ssh_host=sql_alchemy_dict["ssh_options"]["host"], | ||
| ssh_port=int(sql_alchemy_dict["ssh_options"]["port"]), | ||
| ssh_user=sql_alchemy_dict["ssh_options"]["user"], | ||
| remote_host=original_url.host, | ||
| remote_port=int(original_url.port), | ||
| private_key=base64.b64decode(base64_encoded_key).decode("utf-8"), | ||
| ) | ||
| url = URL.create( | ||
| drivername=original_url.drivername, | ||
| username=original_url.username, | ||
| password=original_url.password, | ||
| host=server.local_bind_host, | ||
| port=server.local_bind_port, | ||
| database=original_url.database, | ||
| query=original_url.query, | ||
| ) | ||
| return server, url | ||
|
|
||
|
|
||
| def _close_ssh_tunnel(server) -> None: | ||
| if server is not None and server.is_active: | ||
| server.close() |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Type the new SSH/session helpers and use a specific exception.
These helpers are new control-path code, but their signatures are still untyped and _open_ssh_tunnel() raises a bare Exception. Please make the contract explicit so callers can reason about ownership and failures.
As per coding guidelines, "Use explicit type hints for function parameters and return values", "Use appropriate exception types for error handling", and "Document functions and classes with docstrings".
Also applies to: 475-581
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 373-375: Create your own exception
(TRY002)
[warning] 373-375: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 365 - 399, Change
_open_ssh_tunnel and _close_ssh_tunnel to have explicit type hints and raise a
specific exception type instead of a bare Exception: add a small custom
exception class (e.g., SSHTunnelError or SSHTunnelSetupError) and use it when
PRIVATE_SSH_KEY_BLOB is missing or create_ssh_tunnel fails; annotate
_open_ssh_tunnel signature to return Tuple[Any, URL] (or the concrete tunnel
type if available) and accept sql_alchemy_dict: Mapping[str, Any], and annotate
_close_ssh_tunnel(server: Optional[Any]) -> None; update docstrings to state
ownership/closing contract and failure modes; keep references to the existing
symbols (_open_ssh_tunnel, _close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env,
make_url, URL) so reviewers can find and validate the changes.
| extra_engine_params: dict[str, Any] = {"pool_pre_ping": True} | ||
| if in_session: | ||
| # See sql_session() docstring for why pool_size=1 is required to make | ||
| # session state (USE WAREHOUSE, ...) persist across calls. | ||
| extra_engine_params["pool_size"] = 1 | ||
| extra_engine_params["max_overflow"] = 0 | ||
|
|
||
| try: | ||
| with suppress_third_party_deprecation_warnings(): | ||
| engine = create_engine( | ||
| url, **sql_alchemy_dict["params"], **extra_engine_params | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Python, what happens when the same keyword argument is supplied by two **kwargs expansions in a single function call?
💡 Result:
In Python (3.5+), supplying the same keyword argument from two kwargs expansions in a single function call, such as f(d1, d2) where d1 and d2 both contain the same key, raises TypeError: f got multiple values for keyword argument 'keyname'. This behavior is specified in PEP 448, which introduced multiple unpacking: "Currently, if an argument is given multiple times — such as a positional argument given both positionally and by keyword — a TypeError is raised. This remains true for duplicate arguments provided through multiple unpackings, e.g. f({'x': 2}, {'x': 3}), except that the error will be detected at runtime." Python merges the kwargs from left to right, but detects duplicates during argument processing in getargs.c and raises the error. This distinguishes from dictionary merging {d1, d2}, which keeps the last value without error. Example: def f(kwargs): pass d1 = {'a': 1} d2 = {'a': 2} f(d1, d2) # TypeError: f got multiple values for keyword argument 'a'
Citations:
- 1: https://peps.python.org/pep-0448/
- 2: https://stackoverflow.com/questions/38860676/multiple-kwargs-in-a-function-call
- 3: kwargs regression when there are multiple entries with the same key python/cpython#79815
- 4: python/cpython@e036ef8
- 5: https://discuss.python.org/t/why-dict-expansion-allows-dup-keys-but-kwargs-does-not/71291
🏁 Script executed:
cd deepnote_toolkit && find . -type f -name "sql_execution.py" | head -1Repository: deepnote/deepnote-toolkit
Length of output: 91
🏁 Script executed:
# Check the exact file and read lines around 561-572
wc -l deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 110
🏁 Script executed:
# Read the relevant section
sed -n '555,580p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 973
🏁 Script executed:
# Search for sql_alchemy_dict definition and usage
rg "sql_alchemy_dict" deepnote_toolkit/sql/sql_execution.py -B 3 -A 3Repository: deepnote/deepnote-toolkit
Length of output: 8322
🏁 Script executed:
# Check where pool_pre_ping is used in the codebase
rg "pool_pre_ping" deepnote_toolkit/Repository: deepnote/deepnote-toolkit
Length of output: 172
🏁 Script executed:
# Check for tests that might show if pool_pre_ping is passed in params
find . -name "*test*sql*" -type f | head -10Repository: deepnote/deepnote-toolkit
Length of output: 349
🏁 Script executed:
# Search for any configuration or default params that might include pool_pre_ping
rg "pool_pre_ping\|params.*=" deepnote_toolkit/sql/ -B 2 -A 2Repository: deepnote/deepnote-toolkit
Length of output: 51
🏁 Script executed:
# Check the _session_resource_key function more carefully to understand the pattern
sed -n '730,750p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 910
🏁 Script executed:
# Search for the _session_resource_key function definition
rg "_session_resource_key" deepnote_toolkit/sql/sql_execution.py -A 15Repository: deepnote/deepnote-toolkit
Length of output: 1266
🏁 Script executed:
# Look for test files that might test the engine creation with params
grep -l "pool_pre_ping\|create_engine\|_acquire_engine" tests/unit/test_sql_execution*.pyRepository: deepnote/deepnote-toolkit
Length of output: 143
🏁 Script executed:
# Check test_sql_execution.py for relevant tests
head -100 tests/unit/test_sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 3193
🏁 Script executed:
# Check if there are any tests showing pool_pre_ping in params
rg "pool_pre_ping" tests/Repository: deepnote/deepnote-toolkit
Length of output: 238
🏁 Script executed:
# Search for any examples of params being passed in tests
rg "params.*:" tests/unit/test_sql_execution.py | head -20Repository: deepnote/deepnote-toolkit
Length of output: 1661
🏁 Script executed:
# Look at the full context of _acquire_engine to understand the flow
sed -n '760,800p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 1689
🏁 Script executed:
# Look at the test_sql_execution_internal.py file to see the pool_pre_ping tests
grep -B 10 -A 10 "pool_pre_ping" tests/unit/test_sql_execution_internal.pyRepository: deepnote/deepnote-toolkit
Length of output: 1726
🏁 Script executed:
# Check if sql_alchemy_dict["params"] could ever contain pool_pre_ping by searching all uses
rg "\"params\".*pool_pre_ping|pool_pre_ping.*\"params\"" deepnote_toolkit/Repository: deepnote/deepnote-toolkit
Length of output: 51
🏁 Script executed:
# Look at how sql_alchemy_dict["params"] is built/populated
sed -n '420,480p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 2670
🏁 Script executed:
# Check the _make_sql_alchemy_dict function used in tests to see what params typically look like
rg "_make_sql_alchemy_dict" tests/unit/test_sql_execution_internal.py -A 20Repository: deepnote/deepnote-toolkit
Length of output: 6504
🏁 Script executed:
# Look at the execute_sql_with_connection_json to see how sql_alchemy_dict is created in real usage
sed -n '200,250p' deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 1598
🏁 Script executed:
# Check if there's any documentation or comments about where pool_pre_ping could come from
rg -B 3 -A 3 "pool_pre_ping|extra_engine_params" deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 691
Merge engine kwargs before calling create_engine().
If sql_alchemy_dict["params"] contains pool_pre_ping, passing create_engine(url, **params, **extra_engine_params) raises TypeError due to duplicate kwargs. Merge to a single dict first and use setdefault() to avoid conflicts.
Suggested fix
- extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
+ engine_params: dict[str, Any] = dict(sql_alchemy_dict["params"])
+ engine_params.setdefault("pool_pre_ping", True)
if in_session:
# See sql_session() docstring for why pool_size=1 is required to make
# session state (USE WAREHOUSE, ...) persist across calls.
- extra_engine_params["pool_size"] = 1
- extra_engine_params["max_overflow"] = 0
+ engine_params["pool_size"] = 1
+ engine_params["max_overflow"] = 0
try:
with suppress_third_party_deprecation_warnings():
- engine = create_engine(
- url, **sql_alchemy_dict["params"], **extra_engine_params
- )
+ engine = create_engine(url, **engine_params)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 561 - 572, The call to
create_engine currently passes two kwarg dicts separately which can cause a
TypeError if sql_alchemy_dict["params"] contains overlapping keys like
pool_pre_ping; before calling create_engine (in the block using
suppress_third_party_deprecation_warnings and the in_session logic), merge
sql_alchemy_dict["params"] and extra_engine_params into a single dict (e.g.,
copy sql_alchemy_dict["params"] then use setdefault or update carefully) so
extra_engine_params values only apply when keys are absent, then call
create_engine(url, **merged_params) instead of passing two dicts.
Drops the sql_session() context manager and the engine lifecycle changes it required (per-cell registry, pool_size=1, SSH tunnel lifetime extension). Restores the original per-call create_engine + dispose flow. In its place, execute_sql / execute_sql_with_connection_json grow an optional setup_statements: list[str] parameter. When provided, those statements are run via connection.exec_driver_sql on the same DBAPI connection inside the same engine.begin() block as the main query, so session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...`, etc. is in effect for the main query without changing the engine's lifetime. Single call, single engine, single tunnel, single connection — no shared mutable state across calls and no special-casing for SSH or custom pool config. DuckDB's process-wide singleton means setup statements naturally persist there too; the duckdb branch executes them on the singleton before the main query for API parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/sql_execution.py (1)
419-447:⚠️ Potential issue | 🟠 Major
setup_statementsare not part of cache semantics.Line 435 enables cache even when
setup_statementschanges session context (e.g., warehouse/role/schema). That can return stale/wrong data from a different setup context.Suggested fix (safe default: disable cache when setup statements are present)
def _execute_sql_with_caching( @@ - sql_caching_enabled = ( - sql_cache_mode != "cache_disabled" and return_variable_type == "dataframe" - ) + has_setup_statements = bool(setup_statements) + sql_caching_enabled = ( + sql_cache_mode != "cache_disabled" + and return_variable_type == "dataframe" + and not has_setup_statements + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepnote_toolkit/sql/sql_execution.py` around lines 419 - 447, The current cache path (controlled by sql_caching_enabled / can_get_sql_cache and get_sql_cache) ignores setup_statements, which can change session context and produce stale/wrong results; update the logic that computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard before calling get_sql_cache) to disable caching when setup_statements is non-empty (i.e., treat any truthy setup_statements as a reason to skip get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and return_variable_type checks remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 631: In _execute_sql_on_engine, validate setup_statements before
iterating: ensure it's either None or an iterable of statements (not a plain
string). Add a guard that detects isinstance(setup_statements, str) and either
wrap it as a single-item list or raise a TypeError, and also validate items are
strings (or convert them) before the loop that executes each statement to avoid
iterating over characters and producing misleading SQL errors.
- Around line 105-106: Several functions in this module are missing parameter
and return type annotations; replace any parameter defaults like
setup_statements=None with an explicit Optional type (e.g., setup_statements:
Optional[Sequence[str]] = None) and add explicit return type annotations for
each function that currently lacks them. Import typing primitives you need (from
typing import Optional, Sequence, Iterable, Any) and update the function
signatures (including the function that declares setup_statements) to use
Optional[T] rather than T = None, and add concrete return types (e.g., -> None,
-> int, -> List[Any], or -> Iterable[str] as appropriate) to the five
un-annotated functions in this module so all parameters and returns are fully
typed. Ensure the parameter name setup_statements is changed to
setup_statements: Optional[<appropriate type>] = None and that each function
signature and its return type reflect the actual values they produce.
---
Outside diff comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 419-447: The current cache path (controlled by sql_caching_enabled
/ can_get_sql_cache and get_sql_cache) ignores setup_statements, which can
change session context and produce stale/wrong results; update the logic that
computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard
before calling get_sql_cache) to disable caching when setup_statements is
non-empty (i.e., treat any truthy setup_statements as a reason to skip
get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and
return_variable_type checks remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1e59874a-01db-41f1-9708-1368007f2686
📒 Files selected for processing (3)
deepnote_toolkit/sql/sql_execution.pytests/unit/test_sql_execution.pytests/unit/test_sql_execution_internal.py
| setup_statements=None, | ||
| ): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l deepnote_toolkit/sql/sql_execution.pyRepository: deepnote/deepnote-toolkit
Length of output: 110
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '100,115p'Repository: deepnote/deepnote-toolkit
Length of output: 976
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '235,250p'Repository: deepnote/deepnote-toolkit
Length of output: 868
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '414,430p'Repository: deepnote/deepnote-toolkit
Length of output: 973
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '492,505p'Repository: deepnote/deepnote-toolkit
Length of output: 581
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,640p'Repository: deepnote/deepnote-toolkit
Length of output: 957
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '95,110p'Repository: deepnote/deepnote-toolkit
Length of output: 682
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '230,250p'Repository: deepnote/deepnote-toolkit
Length of output: 979
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '408,435p'Repository: deepnote/deepnote-toolkit
Length of output: 1272
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '485,510p'Repository: deepnote/deepnote-toolkit
Length of output: 928
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,660p'Repository: deepnote/deepnote-toolkit
Length of output: 1882
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '99,107p'Repository: deepnote/deepnote-toolkit
Length of output: 344
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '234,242p'Repository: deepnote/deepnote-toolkit
Length of output: 331
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '410,421p'Repository: deepnote/deepnote-toolkit
Length of output: 453
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '490,499p'Repository: deepnote/deepnote-toolkit
Length of output: 395
🏁 Script executed:
cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '631,647p'Repository: deepnote/deepnote-toolkit
Length of output: 1076
Add type hints to all function parameters and return annotations.
The five functions at lines 99, 234, 410, 490, and 631 lack any parameter or return type annotations. Replace setup_statements=None with setup_statements: Optional[...] and add explicit return type annotations to each function. Per coding guidelines, use Optional[T] instead of T = None, and include return types for all functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 105 - 106, Several
functions in this module are missing parameter and return type annotations;
replace any parameter defaults like setup_statements=None with an explicit
Optional type (e.g., setup_statements: Optional[Sequence[str]] = None) and add
explicit return type annotations for each function that currently lacks them.
Import typing primitives you need (from typing import Optional, Sequence,
Iterable, Any) and update the function signatures (including the function that
declares setup_statements) to use Optional[T] rather than T = None, and add
concrete return types (e.g., -> None, -> int, -> List[Any], or -> Iterable[str]
as appropriate) to the five un-annotated functions in this module so all
parameters and returns are fully typed. Ensure the parameter name
setup_statements is changed to setup_statements: Optional[<appropriate type>] =
None and that each function signature and its return type reflect the actual
values they produce.
|
|
||
|
|
||
| def _execute_sql_on_engine(engine, query, bind_params): | ||
| def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None): |
There was a problem hiding this comment.
Validate setup_statements type before iterating.
Line 669 accepts any iterable. If a caller passes a string, it runs one-character “statements” and fails with misleading SQL errors.
Suggested fix
def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
@@
+ if setup_statements is None:
+ normalized_setup_statements: list[str] = []
+ elif isinstance(setup_statements, list) and all(
+ isinstance(stmt, str) for stmt in setup_statements
+ ):
+ normalized_setup_statements = setup_statements
+ else:
+ raise TypeError("setup_statements must be a list[str] or None")
+
with engine.begin() as connection:
@@
- for stmt in setup_statements or []:
+ for stmt in normalized_setup_statements:
connection.exec_driver_sql(stmt)Also applies to: 667-670
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 631-631: Missing return type annotation for private function _execute_sql_on_engine
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepnote_toolkit/sql/sql_execution.py` at line 631, In
_execute_sql_on_engine, validate setup_statements before iterating: ensure it's
either None or an iterable of statements (not a plain string). Add a guard that
detects isinstance(setup_statements, str) and either wrap it as a single-item
list or raise a TypeError, and also validate items are strings (or convert them)
before the loop that executes each statement to avoid iterating over characters
and producing misleading SQL errors.
Adds extract_setup_statements parser at deepnote_toolkit/sql/
setup_statement_parser.py and wires it into execute_sql_with_
connection_json. After the cell SQL is Jinja-rendered, the parser
strips a strict leading run of USE WAREHOUSE / USE DATABASE /
USE SCHEMA / USE ROLE / USE SECONDARY ROLES / SET / ALTER SESSION
statements off the front and feeds them down as setup_statements,
which run on the same connection as the main query inside one
engine.begin() block.
The parser:
- Skips line and block comments before/between statements.
- Respects single-quoted strings, double-quoted identifiers, and
Snowflake $$-quoted strings when looking for unquoted ;.
- Stops on the first non-setup statement.
- Passes through unchanged if the cell is setup-only or has no
closing ; on the prefix.
- Raises SetupStatementError with a clear message pointing to the
setup_statements= kwarg if any extracted statement contains a
bind placeholder (would otherwise blow up at exec_driver_sql,
since it doesn't bind).
Test surface upgrade:
- New tests/unit/test_setup_statement_parser.py covers all parser
cases, including placeholder detection across pyformat / format /
named / numeric / qmark styles.
- New TestSetupStatementsAgainstRealSQLite uses a real
sqlite:///:memory: engine and proves setup statements share the
connection with the main query (TEMP TABLEs created in setup are
visible to the main SELECT — only possible if they share a
physical DBAPI connection).
- Drops the two over-mocked _execute_sql_on_engine ordering tests
that mocked pandas.read_sql_query away; their substance is now
covered by the SQLite test. Keeps the abort-on-setup-failure test.
- New TestSetupStatementParserIntegration verifies the parser's
output flows through to _execute_sql_on_engine and that the
templated-value error reaches the caller.
After this PR, _dntk.execute_sql("USE WAREHOUSE abc; SELECT 1", env)
just works end-to-end against Snowflake — no codegen change in any
caller is required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepnote_toolkit/sql/setup_statement_parser.py`:
- Around line 143-145: _match_setup_prefix only advances over whitespace between
allowlisted keywords (prefix) so inline comments like /* ... */ prevent
matching; update the loop that currently calls _skip_inline_whitespace(s, cur)
to instead advance past both whitespace and inline comments (either by calling
an existing helper that skips comments, e.g.,
_skip_inline_whitespace_and_comments(s, cur), or by extending
_skip_inline_whitespace to handle /* ... */ and -- style comments) so cur moves
past comments before matching the next keyword; ensure you update the same loop
that iterates "for i, word in enumerate(prefix)" and maintain the correct cur
value for subsequent _match_keyword checks.
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 681-685: The setup statements are executed via
connection.exec_driver_sql before the cursor-tracking wrapper is installed, so
long-running setup queries won't be registered and can't be canceled by
cancel_all_cursors(); move the installation of the tracking wrapper to run
immediately after opening the physical connection (i.e., before the for stmt in
setup_statements loop) so that connection.exec_driver_sql calls register their
cursors with the same tracking mechanism (ensure the code that wraps or replaces
the engine.begin() connection with the tracked connection is invoked prior to
executing setup_statements).
- Around line 221-225: The current code overwrites query_preview_source with
compiled_query when parsed_setup is present, causing preview-only rewrites
(e.g., LIMIT 100) to be stored in DeepnoteQueryPreview.deepnote_query; instead
preserve the original main query string for previews. Change the logic in the
block that checks parsed_setup so query_preview_source is set to the original
main query (the un-rewritten main SQL variable used earlier, e.g.,
original_main_query or the parsed "main" portion) rather than compiled_query;
keep compiled_query for execution but ensure DeepnoteQueryPreview.deepnote_query
is populated from the preserved original main query when parsed_setup is
non-empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e04b08ff-cf77-404c-9550-93852e283fdd
📒 Files selected for processing (5)
deepnote_toolkit/sql/setup_statement_parser.pydeepnote_toolkit/sql/sql_execution.pytests/unit/test_setup_statement_parser.pytests/unit/test_sql_execution.pytests/unit/test_sql_execution_internal.py
| for i, word in enumerate(prefix): | ||
| if i > 0: | ||
| cur = _skip_inline_whitespace(s, cur) |
There was a problem hiding this comment.
Skip comments between allowlisted keywords.
_match_setup_prefix() only advances over whitespace here. Valid SQL like USE /* pick one */ ROLE analyst; SELECT 1 or ALTER /*x*/ SESSION ... will not be extracted, so the cell falls back to the old multi-statement path instead of the new same-connection behavior.
Suggested fix
for i, word in enumerate(prefix):
if i > 0:
- cur = _skip_inline_whitespace(s, cur)
+ cur = _skip_whitespace_and_comments(s, cur)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepnote_toolkit/sql/setup_statement_parser.py` around lines 143 - 145,
_match_setup_prefix only advances over whitespace between allowlisted keywords
(prefix) so inline comments like /* ... */ prevent matching; update the loop
that currently calls _skip_inline_whitespace(s, cur) to instead advance past
both whitespace and inline comments (either by calling an existing helper that
skips comments, e.g., _skip_inline_whitespace_and_comments(s, cur), or by
extending _skip_inline_whitespace to handle /* ... */ and -- style comments) so
cur moves past comments before matching the next keyword; ensure you update the
same loop that iterates "for i, word in enumerate(prefix)" and maintain the
correct cur value for subsequent _match_keyword checks.
| # query_preview_source mirrors the user-visible main query for cache | ||
| # key/preview purposes — keep it aligned with the post-extraction | ||
| # remainder. | ||
| if parsed_setup: | ||
| query_preview_source = compiled_query |
There was a problem hiding this comment.
Preserve the original main query in query_preview_source.
When parsed_setup is non-empty, this replaces query_preview_source with the rendered remainder, which already carries preview-only rewrites like LIMIT 100. DeepnoteQueryPreview.deepnote_query will then store the limited/compiled SQL instead of the original main query, so chaining a preview from USE ...; SELECT ... changes semantics.
Suggested fix
- if parsed_setup:
- query_preview_source = compiled_query
+ if parsed_setup:
+ _, query_preview_source = extract_setup_statements(query_preview_source)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 221 - 225, The current
code overwrites query_preview_source with compiled_query when parsed_setup is
present, causing preview-only rewrites (e.g., LIMIT 100) to be stored in
DeepnoteQueryPreview.deepnote_query; instead preserve the original main query
string for previews. Change the logic in the block that checks parsed_setup so
query_preview_source is set to the original main query (the un-rewritten main
SQL variable used earlier, e.g., original_main_query or the parsed "main"
portion) rather than compiled_query; keep compiled_query for execution but
ensure DeepnoteQueryPreview.deepnote_query is populated from the preserved
original main query when parsed_setup is non-empty.
| with engine.begin() as connection: | ||
| # Run setup statements first on the same physical connection so any | ||
| # session state they set is visible to the main query below. | ||
| for stmt in setup_statements or []: | ||
| connection.exec_driver_sql(stmt) |
There was a problem hiding this comment.
Install cursor tracking before running setup statements.
The new exec_driver_sql loop runs before the tracking wrapper is attached. If a setup statement blocks and the cell is interrupted, its cursor is never registered, so cancel_all_cursors() cannot stop it. That regresses this function's cancellation guarantee for the new path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepnote_toolkit/sql/sql_execution.py` around lines 681 - 685, The setup
statements are executed via connection.exec_driver_sql before the
cursor-tracking wrapper is installed, so long-running setup queries won't be
registered and can't be canceled by cancel_all_cursors(); move the installation
of the tracking wrapper to run immediately after opening the physical connection
(i.e., before the for stmt in setup_statements loop) so that
connection.exec_driver_sql calls register their cursors with the same tracking
mechanism (ensure the code that wraps or replaces the engine.begin() connection
with the tracked connection is invoked prior to executing setup_statements).
connection.exec_driver_sql is the documented contract here — setup statements are session-control SQL (USE WAREHOUSE, USE ROLE, SET, ALTER SESSION) that cannot be parameter-bound, so the semgrep sqlalchemy-execute-raw-query rule's recommendation doesn't apply. Add a focused nosemgrep annotation with the rationale inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)
646-646: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd return type annotation.
Static analysis correctly flags this. Function returns
DataFrame | None.-def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None): +def _execute_sql_on_engine( + engine, query, bind_params, setup_statements: Optional[list[str]] = None +) -> Optional["pd.DataFrame"]:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepnote_toolkit/sql/sql_execution.py` at line 646, The function _execute_sql_on_engine lacks a return type annotation; update its signature to include the correct type (DataFrame | None) — e.g. change def _execute_sql_on_engine(...) to def _execute_sql_on_engine(...) -> DataFrame | None — and ensure DataFrame is imported (from pandas import DataFrame) or use Optional[DataFrame] with typing imports so static analysis stops flagging the missing return type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 646: The function _execute_sql_on_engine lacks a return type annotation;
update its signature to include the correct type (DataFrame | None) — e.g.
change def _execute_sql_on_engine(...) to def _execute_sql_on_engine(...) ->
DataFrame | None — and ensure DataFrame is imported (from pandas import
DataFrame) or use Optional[DataFrame] with typing imports so static analysis
stops flagging the missing return type.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e064366b-70e4-46b9-9940-d941ab90462c
📒 Files selected for processing (1)
deepnote_toolkit/sql/sql_execution.py
Summary
A SQL block like
now runs the SELECT on warehouse
abc. The toolkit splits the leading session-setup statements off the cell SQL and runs them on the same physical DBAPI connection as the main query, inside the sameengine.begin()block.No callers need to change —
_dntk.execute_sql("USE WAREHOUSE abc; SELECT 1", env)just works.What's in here
deepnote_toolkit/sql/setup_statement_parser.py) that strips a strict leading run ofUSE WAREHOUSE,USE DATABASE,USE SCHEMA,USE ROLE,USE SECONDARY ROLES,SET, orALTER SESSIONstatements off the cell SQL. Respects line/block comments, single-quoted strings, double-quoted identifiers, and Snowflake$$-quoted strings. Stops at the first non-setup statement. Passes through unchanged if the cell is setup-only or has no closing;on the prefix.setup_statements: list[str]kwarg onexecute_sqlandexecute_sql_with_connection_json. When provided, those statements run viaconnection.exec_driver_sqlon the same DBAPI connection as the main query. Acts as the explicit escape hatch for the templated-value case below.execute_sql_with_connection_jsoncalls the parser aftercompile_sql_query. Parsed setup statements + the explicitsetup_statementskwarg are concatenated and run before the main query insideengine.begin().Why parse, not just expose the kwarg
Without parsing, every caller (the SQL block UI, the Python SDK, agents that emit SQL) would have to know to split USE statements off and pass them as
setup_statements. Putting the parser in the toolkit makes the fix work end-to-end at the API surface, in one place, with one test suite.Why parse after Jinja, not before
The scanner sees rendered SQL, no
{{ }}/{% %}to reason about. Catches both the dynamic case ({{ render_setup() }}rendering toUSE WAREHOUSE x) and Jinja conditionals that emit semicolons. The one trade-off is templated values inside a setup statement (USE WAREHOUSE {{ env }}) render to a bind placeholder thatexec_driver_sqldoesn't bind and that Snowflake'sUSE WAREHOUSEdoesn't accept anyway. The parser detects this and raisesSetupStatementErrorpointing the caller at the explicitsetup_statements=kwarg as the escape hatch. Detection respects quoted regions so a literal%(x)sinside a string doesn't false-positive.Test plan
tests/unit/test_setup_statement_parser.py— 26 tests covering all parser cases plus placeholder detection acrosspyformat/format/named/numeric/qmarkstyles.TestSetupStatementsAgainstRealSQLite— a realsqlite:///:memory:engine. The setup statement creates aTEMP TABLE, the main query selects from it. SQLite TEMP tables are connection-scoped — the SELECT only succeeds if setup and main share a physical DBAPI connection. This is the substantive end-to-end check; the previous over-mocked tests that stubbedpandas.read_sql_queryare gone.TestSetupStatementParserIntegration— checks that the parser's output (parsed setup + remaining query) flows through to_execute_sql_on_engine, that explicitsetup_statements=is appended after parsed ones, and thatUSE WAREHOUSE {{ env }}raises with a message that namessetup_statements=.poetry run pytest tests/unit/— 784 passed (excludes pre-existing pyspark collection errors intest_chart.py/test_ocelots.py).poetry run blackandpoetry run flake8clean.Usage
🤖 Generated with Claude Code
Summary by CodeRabbit