Skip to content

fix(python-driver): quote-escape column aliases in buildCypher()#2373

Open
uesleilima wants to merge 2 commits intoapache:masterfrom
uesleilima:fix/python-driver-column-alias-quoting
Open

fix(python-driver): quote-escape column aliases in buildCypher()#2373
uesleilima wants to merge 2 commits intoapache:masterfrom
uesleilima:fix/python-driver-column-alias-quoting

Conversation

@uesleilima
Copy link
Copy Markdown

Summary

Fix execCypher() crashing when column aliases are PostgreSQL reserved words (#2370).

Problem

The buildCypher() function generates SQL like:

SELECT * FROM cypher(NULL,NULL) AS (count agtype);

When the column alias is a reserved word (count, order, type, group, select, etc.), PostgreSQL raises a parse error because the identifier is unquoted.

This commonly happens when Cypher RETURN clauses use aggregation functions:

cursor = age.execCypher(conn, "test_graph",
                        "MATCH (n:TestNode) RETURN count(n)",
                        cols=["count"])  # crashes

Fix

Always double-quote column names in _validate_column():

# Before
return f"{name} {type_name}"     # count agtype  → ERROR
# After
return f'"{name}" {type_name}'   # "count" agtype → OK

PostgreSQL always accepts double-quoted identifiers, so quoting unconditionally is safe for all names — no reserved word list needed. This is simpler and more maintainable than checking against a list of 60+ reserved words.

Generated SQL after fix:

SELECT * FROM cypher(NULL,NULL) AS ("count" agtype);

Tests

Added 11 new unit tests in TestBuildCypher class (no DB required):

  • Simple column, column with type, multiple columns
  • Reserved words: count, order, type, select, group
  • Default column quoting
  • Invalid column rejection
  • _validate_column() direct quoting verification

Closes

Always double-quote column names in the AS (...) clause generated by
buildCypher() and _validate_column(). This prevents PostgreSQL parse
errors when column aliases happen to be reserved words (e.g. 'count',
'order', 'type', 'group', 'select').

Before: SELECT * from cypher(NULL,NULL) as (count agtype);
After:  SELECT * from cypher(NULL,NULL) as ("count" agtype);

PostgreSQL always accepts double-quoted identifiers, so quoting
unconditionally is safe for all names — no reserved word list needed.

Closes apache#2370
@tcolo
Copy link
Copy Markdown

tcolo commented Apr 5, 2026

@uesleilima many thanks for this. First check with Claude code

Code Review: apache/age#2373fix(python-driver): quote-escape column aliases in buildCypher()

Summary

A minimal, well-targeted fix for a real crash (#2370): column aliases matching PostgreSQL reserved words (e.g. count, order) caused parse errors. The solution double-quotes all column names unconditionally, which is safe and idiomatic PostgreSQL. Includes 11 new unit tests.


🔴 Critical Issues

None.


💡 Suggestions

# File Location Suggestion Category
1 age.py _validate_column line 261 Type name is not quoted. return f'"{name}" {type_name}' only quotes the name, but type_name is also a user-provided identifier. While in practice AGE only uses agtype, it is inconsistent — if a caller passes v mytype, the type name is unquoted. Consider quoting it too, or documenting why it is intentionally left unquoted. Correctness
2 age.py buildCypher line 268 if graphName == None: should be if graphName is None: — pre-existing issue, but touched in this diff's context. Style
3 test_age_py.py test_reserved_word_count The assertNotIn("(count agtype", result) check is narrowly crafted. It would miss a regression like ( count agtype (space after paren). A stronger assertion would be a regex like r'\bcount\s+agtype\b' outside of quotes. Testing
4 test_age_py.py TestBuildCypher imports from age.age import buildCypher, _validate_column imports a private function (_validate_column is prefixed _). Testing private functions directly is acceptable here since the behavior is important, but worth a comment noting it's intentional. Maintainability
5 age.py buildCypher No test covers the "name type" path where type_name contains a reserved-word-like string (e.g., "order agtype"). Adding one would verify the quoting is applied in the two-token branch as well. Testing

✅ What Looks Good

  • Minimal diff — only the three affected lines changed, no scope creep.
  • Correct fix — unconditional double-quoting is the right approach; PostgreSQL always accepts double-quoted identifiers, and the identifier regex ^[A-Za-z_][A-Za-z0-9_]*$ already prevents injection of embedded quotes.
  • Updated comment — the design note in buildCypher was proactively updated to document the quoting rationale.
  • Solid test coverage — 11 unit tests with no DB dependency is exactly right for this kind of fix; the reserved-word cases are all explicitly called out.
  • test_validate_column_quoting directly asserts the output format, making future regressions immediately obvious.

Verdict

Approve with minor suggestions. The fix is correct and safe. Suggestion #1 (unquoted type_name) is the only one worth a follow-up conversation — the rest are low-priority polish.

Copy link
Copy Markdown

@tcolo tcolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the comment I made

- _validate_column: add comment explaining why type_name is
  intentionally left unquoted (PG type names are case-insensitive)
- buildCypher: fix graphName == None to idiomatic 'is None'
- test_reserved_word_count: replace fragile assertNotIn with regex
- Add test for reserved word in "name type" pair (e.g. "order agtype")
- Add comment explaining intentional _validate_column private import

Made-with: Cursor
@uesleilima uesleilima requested a review from tcolo April 5, 2026 23:02
@MuhammadTahaNaveed MuhammadTahaNaveed requested review from Copilot and removed request for tcolo April 6, 2026 10:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the Python driver’s buildCypher() SQL generation so column aliases that are PostgreSQL reserved words don’t cause parse errors, by always double-quoting the column identifier portion of each AS (...) column spec.

Changes:

  • Update _validate_column() to always wrap validated column names in double quotes (while leaving type names unquoted).
  • Quote the default column alias (v) in buildCypher() when cols is not provided.
  • Add unit tests covering reserved-word aliases and direct _validate_column() quoting behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
drivers/python/age/age.py Always double-quotes column aliases produced by _validate_column() / buildCypher() to avoid reserved-word parse errors.
drivers/python/test_age_py.py Adds unit tests validating quoted output for common/reserved column aliases and invalid identifiers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +40 to +46
def test_simple_column(self):
result = buildCypher("g", "MATCH (n) RETURN n", ["n"])
self.assertIn('"n" agtype', result)

def test_column_with_type(self):
result = buildCypher("g", "MATCH (n) RETURN n", ["n agtype"])
self.assertIn('"n" agtype', result)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break existing Python tests that assert unquoted column specs (e.g., drivers/python/test_security.py currently expects v agtype / n agtype in buildCypher() output and _validate_column() return values). Update those assertions to match the new always-quoted behavior so the test suite stays consistent and CI doesn’t fail.

Copilot uses AI. Check for mistakes.
columnExp.append(validated)
else:
columnExp.append('v agtype')
columnExp.append('"v" agtype')
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new always-quoted column output from buildCypher() / _validate_column() requires updating any in-repo callers/tests that compare generated SQL strings (e.g., security tests that look for v agtype or n agtype). Without updating those expectations, the Python test suite will fail.

Suggested change
columnExp.append('"v" agtype')
columnExp.append(_validate_column('v'))

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +214
# Only the column name is double-quoted. The type name is left
# unquoted because PostgreSQL type names in column definitions
# are case-insensitive identifiers; double-quoting them would
# force exact-case matching and break user-defined type lookup.
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale comment about not quoting type_name is a bit misleading: double-quoting doesn’t inherently “break user-defined type lookup”; it changes name-folding semantics (quoted identifiers become case-sensitive, unquoted fold to lower-case). Consider rewording to clarify that leaving the type unquoted preserves PostgreSQL’s default case-folding behavior and avoids surprising resolution changes.

Suggested change
# Only the column name is double-quoted. The type name is left
# unquoted because PostgreSQL type names in column definitions
# are case-insensitive identifiers; double-quoting them would
# force exact-case matching and break user-defined type lookup.
# Only the column name is double-quoted. The type name is left
# unquoted so PostgreSQL applies its default identifier folding
# for type names in column definitions. Double-quoting would make
# the type name case-sensitive and could change type resolution in
# surprising ways for user-defined types.

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +218
return f'"{name}" {type_name}'
else:
validate_identifier(col, "Column name")
return f"{col} agtype"
return f'"{col}" agtype'
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always double-quoting column names changes PostgreSQL’s identifier folding semantics: previously an unquoted alias like V would be exposed as v (folded to lower-case), but now "V" will stay uppercase. If callers relied on the old lowercasing behavior for cols values with uppercase letters, this is a user-visible behavior change; consider normalizing name to lower-case before quoting or documenting this change.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python driver: execCypher() should quote-escape column aliases that are PostgreSQL reserved words

3 participants