fix(python-driver): quote-escape column aliases in buildCypher()#2373
fix(python-driver): quote-escape column aliases in buildCypher()#2373uesleilima wants to merge 2 commits intoapache:masterfrom
Conversation
Always double-quote column names in the AS (...) clause generated by
buildCypher() and _validate_column(). This prevents PostgreSQL parse
errors when column aliases happen to be reserved words (e.g. 'count',
'order', 'type', 'group', 'select').
Before: SELECT * from cypher(NULL,NULL) as (count agtype);
After: SELECT * from cypher(NULL,NULL) as ("count" agtype);
PostgreSQL always accepts double-quoted identifiers, so quoting
unconditionally is safe for all names — no reserved word list needed.
Closes apache#2370
|
@uesleilima many thanks for this. First check with Claude code Code Review: apache/age#2373 —
|
| # | File | Location | Suggestion | Category |
|---|---|---|---|---|
| 1 | age.py |
_validate_column line 261 |
Type name is not quoted. return f'"{name}" {type_name}' only quotes the name, but type_name is also a user-provided identifier. While in practice AGE only uses agtype, it is inconsistent — if a caller passes v mytype, the type name is unquoted. Consider quoting it too, or documenting why it is intentionally left unquoted. |
Correctness |
| 2 | age.py |
buildCypher line 268 |
if graphName == None: should be if graphName is None: — pre-existing issue, but touched in this diff's context. |
Style |
| 3 | test_age_py.py |
test_reserved_word_count |
The assertNotIn("(count agtype", result) check is narrowly crafted. It would miss a regression like ( count agtype (space after paren). A stronger assertion would be a regex like r'\bcount\s+agtype\b' outside of quotes. |
Testing |
| 4 | test_age_py.py |
TestBuildCypher imports |
from age.age import buildCypher, _validate_column imports a private function (_validate_column is prefixed _). Testing private functions directly is acceptable here since the behavior is important, but worth a comment noting it's intentional. |
Maintainability |
| 5 | age.py |
buildCypher |
No test covers the "name type" path where type_name contains a reserved-word-like string (e.g., "order agtype"). Adding one would verify the quoting is applied in the two-token branch as well. |
Testing |
✅ What Looks Good
- Minimal diff — only the three affected lines changed, no scope creep.
- Correct fix — unconditional double-quoting is the right approach; PostgreSQL always accepts double-quoted identifiers, and the identifier regex
^[A-Za-z_][A-Za-z0-9_]*$already prevents injection of embedded quotes. - Updated comment — the design note in
buildCypherwas proactively updated to document the quoting rationale. - Solid test coverage — 11 unit tests with no DB dependency is exactly right for this kind of fix; the reserved-word cases are all explicitly called out.
test_validate_column_quotingdirectly asserts the output format, making future regressions immediately obvious.
Verdict
Approve with minor suggestions. The fix is correct and safe. Suggestion #1 (unquoted type_name) is the only one worth a follow-up conversation — the rest are low-priority polish.
- _validate_column: add comment explaining why type_name is intentionally left unquoted (PG type names are case-insensitive) - buildCypher: fix graphName == None to idiomatic 'is None' - test_reserved_word_count: replace fragile assertNotIn with regex - Add test for reserved word in "name type" pair (e.g. "order agtype") - Add comment explaining intentional _validate_column private import Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Fixes the Python driver’s buildCypher() SQL generation so column aliases that are PostgreSQL reserved words don’t cause parse errors, by always double-quoting the column identifier portion of each AS (...) column spec.
Changes:
- Update
_validate_column()to always wrap validated column names in double quotes (while leaving type names unquoted). - Quote the default column alias (
v) inbuildCypher()whencolsis not provided. - Add unit tests covering reserved-word aliases and direct
_validate_column()quoting behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
drivers/python/age/age.py |
Always double-quotes column aliases produced by _validate_column() / buildCypher() to avoid reserved-word parse errors. |
drivers/python/test_age_py.py |
Adds unit tests validating quoted output for common/reserved column aliases and invalid identifiers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_simple_column(self): | ||
| result = buildCypher("g", "MATCH (n) RETURN n", ["n"]) | ||
| self.assertIn('"n" agtype', result) | ||
|
|
||
| def test_column_with_type(self): | ||
| result = buildCypher("g", "MATCH (n) RETURN n", ["n agtype"]) | ||
| self.assertIn('"n" agtype', result) |
There was a problem hiding this comment.
This change will break existing Python tests that assert unquoted column specs (e.g., drivers/python/test_security.py currently expects v agtype / n agtype in buildCypher() output and _validate_column() return values). Update those assertions to match the new always-quoted behavior so the test suite stays consistent and CI doesn’t fail.
| columnExp.append(validated) | ||
| else: | ||
| columnExp.append('v agtype') | ||
| columnExp.append('"v" agtype') |
There was a problem hiding this comment.
The new always-quoted column output from buildCypher() / _validate_column() requires updating any in-repo callers/tests that compare generated SQL strings (e.g., security tests that look for v agtype or n agtype). Without updating those expectations, the Python test suite will fail.
| columnExp.append('"v" agtype') | |
| columnExp.append(_validate_column('v')) |
| # Only the column name is double-quoted. The type name is left | ||
| # unquoted because PostgreSQL type names in column definitions | ||
| # are case-insensitive identifiers; double-quoting them would | ||
| # force exact-case matching and break user-defined type lookup. |
There was a problem hiding this comment.
The rationale comment about not quoting type_name is a bit misleading: double-quoting doesn’t inherently “break user-defined type lookup”; it changes name-folding semantics (quoted identifiers become case-sensitive, unquoted fold to lower-case). Consider rewording to clarify that leaving the type unquoted preserves PostgreSQL’s default case-folding behavior and avoids surprising resolution changes.
| # Only the column name is double-quoted. The type name is left | |
| # unquoted because PostgreSQL type names in column definitions | |
| # are case-insensitive identifiers; double-quoting them would | |
| # force exact-case matching and break user-defined type lookup. | |
| # Only the column name is double-quoted. The type name is left | |
| # unquoted so PostgreSQL applies its default identifier folding | |
| # for type names in column definitions. Double-quoting would make | |
| # the type name case-sensitive and could change type resolution in | |
| # surprising ways for user-defined types. |
| return f'"{name}" {type_name}' | ||
| else: | ||
| validate_identifier(col, "Column name") | ||
| return f"{col} agtype" | ||
| return f'"{col}" agtype' |
There was a problem hiding this comment.
Always double-quoting column names changes PostgreSQL’s identifier folding semantics: previously an unquoted alias like V would be exposed as v (folded to lower-case), but now "V" will stay uppercase. If callers relied on the old lowercasing behavior for cols values with uppercase letters, this is a user-visible behavior change; consider normalizing name to lower-case before quoting or documenting this change.
Summary
Fix
execCypher()crashing when column aliases are PostgreSQL reserved words (#2370).Problem
The
buildCypher()function generates SQL like:When the column alias is a reserved word (
count,order,type,group,select, etc.), PostgreSQL raises a parse error because the identifier is unquoted.This commonly happens when Cypher
RETURNclauses use aggregation functions:Fix
Always double-quote column names in
_validate_column():PostgreSQL always accepts double-quoted identifiers, so quoting unconditionally is safe for all names — no reserved word list needed. This is simpler and more maintainable than checking against a list of 60+ reserved words.
Generated SQL after fix:
Tests
Added 11 new unit tests in
TestBuildCypherclass (no DB required):count,order,type,select,group_validate_column()direct quoting verificationCloses