Fix: serialise BigQuery NUMERIC / BYTES in streaming responses#8
Conversation
BigQuery `NUMERIC` / `BIGNUMERIC` columns come back as `decimal.Decimal` and `BYTES` columns as raw `bytes` — both refused by orjson out of the box. `_records_object_array` and `_records_array_array` were calling `orjson.dumps(...)` without a `default=` callable, so a search response over a table with any NUMERIC column crashed mid-stream: TypeError: Type is not JSON serializable: decimal.Decimal Fix: - Add `_json_default` to `services/streaming.py` and pass it to both row-emitting writers. Decimal -> `str()` (preserves the full precision NUMERIC / BIGNUMERIC allow, beyond IEEE-754 doubles; matches CKAN's datastore convention for high-precision numerics). bytes -> base64 string so the response stays UTF-8 and round-trippable. - Extend `api/responses.py:_orjson_default` the same way so non-streaming envelopes (datastore_info etc.) don't hit the same bug if Decimal / bytes ever surface there. - CSV / TSV path is unaffected: `csv.writer.writerow(row)` already stringifies values internally. Adds `tests/test_streaming.py` with regression coverage for both row formats and an explicit assertion that unsupported types still raise loudly (so a new BigQuery scalar type can't silently fall through).
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR extends JSON serialization across the datastore to handle BigQuery scalar types ( ChangesJSON Serialization for Non-Native Types
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The bug
A
datastore_search(or any streaming response) over a table containing aNUMERIC/BIGNUMERICcolumn crashed mid-stream:BigQuery returns
NUMERIC/BIGNUMERICas Pythondecimal.DecimalandBYTESas rawbytes.orjsonrefuses both by default, and the two row writers inservices/streaming.pywere callingorjson.dumps(...)without adefault=callable — so the stream would emit some rows, thenTypeErrormid-flight.Fix
_json_defaulttodatastore/services/streaming.pyand pass it to both row writers (_records_object_array,_records_array_array).Decimal→str(...)to preserve full precision (NUMERIC = 38 digits, BIGNUMERIC = 76+ — beyond what a JSON number / IEEE-754 double can represent without loss) and match CKAN's datastore convention for high-precision numerics.bytes→ base64 string so the response stays UTF-8 and round-trippable.TypeErrorloudly, so a future BigQuery scalar that orjson doesn't know about surfaces in tests instead of in production.api/responses.py:_orjson_defaultthe same way so non-streaming envelopes (datastore_infoetc.) don't hit the identical bug if aDecimalever surfaces there.csv.writer.writerow(row)already stringifies values internally.Tests
New
tests/test_streaming.pyadds regression coverage:test_records_object_array_serialises_decimal_and_bytes— fullobjectsformat round-trip.test_records_array_array_serialises_decimal_and_bytes— same forlistsformat.test_unsupported_type_still_raises— guards against the default silently swallowing future unknown types.All three pass; full suite green where it was green before this PR.
Summary by CodeRabbit
Bug Fixes
Tests