Skip to content

Fix: serialise BigQuery NUMERIC / BYTES in streaming responses#8

Merged
sagargg merged 1 commit into
mainfrom
fix/decimal-stream-serialization
Jun 3, 2026
Merged

Fix: serialise BigQuery NUMERIC / BYTES in streaming responses#8
sagargg merged 1 commit into
mainfrom
fix/decimal-stream-serialization

Conversation

@sagargg
Copy link
Copy Markdown
Member

@sagargg sagargg commented Jun 3, 2026

The bug

A datastore_search (or any streaming response) over a table containing a NUMERIC / BIGNUMERIC column crashed mid-stream:

File ".../services/streaming.py", line 222, in _records_object_array
    yield orjson.dumps(dict(zip(columns, row)))
TypeError: Type is not JSON serializable: decimal.Decimal

BigQuery returns NUMERIC / BIGNUMERIC as Python decimal.Decimal and BYTES as raw bytes. orjson refuses both by default, and the two row writers in services/streaming.py were calling orjson.dumps(...) without a default= callable — so the stream would emit some rows, then TypeError mid-flight.

Fix

  • Add _json_default to datastore/services/streaming.py and pass it to both row writers (_records_object_array, _records_array_array).
    • Decimalstr(...) to preserve full precision (NUMERIC = 38 digits, BIGNUMERIC = 76+ — beyond what a JSON number / IEEE-754 double can represent without loss) and match CKAN's datastore convention for high-precision numerics.
    • bytes → base64 string so the response stays UTF-8 and round-trippable.
    • Unknown types still raise TypeError loudly, so a future BigQuery scalar that orjson doesn't know about surfaces in tests instead of in production.
  • Extend api/responses.py:_orjson_default the same way so non-streaming envelopes (datastore_info etc.) don't hit the identical bug if a Decimal ever surfaces there.
  • CSV / TSV path is unaffected: csv.writer.writerow(row) already stringifies values internally.

Tests

New tests/test_streaming.py adds regression coverage:

  • test_records_object_array_serialises_decimal_and_bytes — full objects format round-trip.
  • test_records_array_array_serialises_decimal_and_bytes — same for lists format.
  • test_unsupported_type_still_raises — guards against the default silently swallowing future unknown types.

All three pass; full suite green where it was green before this PR.

Summary by CodeRabbit

  • Bug Fixes

    • Improved JSON serialization to handle high-precision decimal numbers and binary data in API responses and streaming outputs without data loss.
    • Decimal values now preserve full numeric precision as strings, and binary data is properly base64-encoded.
  • Tests

    • Added comprehensive regression tests for streaming functionality with decimal and binary data types.

BigQuery `NUMERIC` / `BIGNUMERIC` columns come back as `decimal.Decimal` and `BYTES` columns as raw `bytes` — both refused by orjson out of the box. `_records_object_array` and `_records_array_array` were calling `orjson.dumps(...)` without a `default=` callable, so a search response over a table with any NUMERIC column crashed mid-stream:

  TypeError: Type is not JSON serializable: decimal.Decimal

Fix:
- Add `_json_default` to `services/streaming.py` and pass it to both row-emitting writers. Decimal -> `str()` (preserves the full precision NUMERIC / BIGNUMERIC allow, beyond IEEE-754 doubles; matches CKAN's datastore convention for high-precision numerics). bytes -> base64 string so the response stays UTF-8 and round-trippable.
- Extend `api/responses.py:_orjson_default` the same way so non-streaming envelopes (datastore_info etc.) don't hit the same bug if Decimal / bytes ever surface there.
- CSV / TSV path is unaffected: `csv.writer.writerow(row)` already stringifies values internally.

Adds `tests/test_streaming.py` with regression coverage for both row formats and an explicit assertion that unsupported types still raise loudly (so a new BigQuery scalar type can't silently fall through).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf5c923d-13c5-4869-82f8-044f120692f5

📥 Commits

Reviewing files that changed from the base of the PR and between d4ef7d9 and 27717cb.

📒 Files selected for processing (3)
  • datastore/api/responses.py
  • datastore/services/streaming.py
  • tests/test_streaming.py

📝 Walkthrough

Walkthrough

This PR extends JSON serialization across the datastore to handle BigQuery scalar types (Decimal for NUMERIC/BIGNUMERIC and bytes for BYTES). Custom orjson handlers convert Decimal to strings for precision and bytes to base64-encoded ASCII for UTF-8 safety. Changes appear in API responses, streaming row writers, and are validated by new regression tests.

Changes

JSON Serialization for Non-Native Types

Layer / File(s) Summary
API Response Serialization Handler
datastore/api/responses.py
Extends _orjson_default to stringify Decimal and base64-encode bytes for API response payloads, preserving numeric precision and ensuring UTF-8 compatibility.
Streaming Service Serialization with Tests
datastore/services/streaming.py, tests/test_streaming.py
Introduces _json_default helper for streaming row encoders (both object and list formats) to handle Decimal and bytes. Tests verify correct JSON encoding for both record types, base64 representation of bytes, and error handling for unsupported types.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through data streams so bright,
With decimals and bytes encoded just right—
Base64 sprinkled, precision preserved,
JSON flows smoothly, exactly deserved! 🐰✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/decimal-stream-serialization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sagargg sagargg merged commit dd0995a into main Jun 3, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant