Fix: serialise BigQuery NUMERIC / BYTES in streaming responses by sagargg · Pull Request #8 · datopian/datastore

sagargg · 2026-06-03T07:43:34Z

The bug

A datastore_search (or any streaming response) over a table containing a NUMERIC / BIGNUMERIC column crashed mid-stream:

File ".../services/streaming.py", line 222, in _records_object_array
    yield orjson.dumps(dict(zip(columns, row)))
TypeError: Type is not JSON serializable: decimal.Decimal

BigQuery returns NUMERIC / BIGNUMERIC as Python decimal.Decimal and BYTES as raw bytes. orjson refuses both by default, and the two row writers in services/streaming.py were calling orjson.dumps(...) without a default= callable — so the stream would emit some rows, then TypeError mid-flight.

Fix

Add _json_default to datastore/services/streaming.py and pass it to both row writers (_records_object_array, _records_array_array).
- Decimal → str(...) to preserve full precision (NUMERIC = 38 digits, BIGNUMERIC = 76+ — beyond what a JSON number / IEEE-754 double can represent without loss) and match CKAN's datastore convention for high-precision numerics.
- bytes → base64 string so the response stays UTF-8 and round-trippable.
- Unknown types still raise TypeError loudly, so a future BigQuery scalar that orjson doesn't know about surfaces in tests instead of in production.
Extend api/responses.py:_orjson_default the same way so non-streaming envelopes (datastore_info etc.) don't hit the identical bug if a Decimal ever surfaces there.
CSV / TSV path is unaffected: csv.writer.writerow(row) already stringifies values internally.

Tests

New tests/test_streaming.py adds regression coverage:

test_records_object_array_serialises_decimal_and_bytes — full objects format round-trip.
test_records_array_array_serialises_decimal_and_bytes — same for lists format.
test_unsupported_type_still_raises — guards against the default silently swallowing future unknown types.

All three pass; full suite green where it was green before this PR.

Summary by CodeRabbit

Bug Fixes
- Improved JSON serialization to handle high-precision decimal numbers and binary data in API responses and streaming outputs without data loss.
- Decimal values now preserve full numeric precision as strings, and binary data is properly base64-encoded.
Tests
- Added comprehensive regression tests for streaming functionality with decimal and binary data types.

BigQuery `NUMERIC` / `BIGNUMERIC` columns come back as `decimal.Decimal` and `BYTES` columns as raw `bytes` — both refused by orjson out of the box. `_records_object_array` and `_records_array_array` were calling `orjson.dumps(...)` without a `default=` callable, so a search response over a table with any NUMERIC column crashed mid-stream: TypeError: Type is not JSON serializable: decimal.Decimal Fix: - Add `_json_default` to `services/streaming.py` and pass it to both row-emitting writers. Decimal -> `str()` (preserves the full precision NUMERIC / BIGNUMERIC allow, beyond IEEE-754 doubles; matches CKAN's datastore convention for high-precision numerics). bytes -> base64 string so the response stays UTF-8 and round-trippable. - Extend `api/responses.py:_orjson_default` the same way so non-streaming envelopes (datastore_info etc.) don't hit the same bug if Decimal / bytes ever surface there. - CSV / TSV path is unaffected: `csv.writer.writerow(row)` already stringifies values internally. Adds `tests/test_streaming.py` with regression coverage for both row formats and an explicit assertion that unsupported types still raise loudly (so a new BigQuery scalar type can't silently fall through).

coderabbitai · 2026-06-03T07:43:46Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf5c923d-13c5-4869-82f8-044f120692f5

📥 Commits

Reviewing files that changed from the base of the PR and between d4ef7d9 and 27717cb.

📒 Files selected for processing (3)

datastore/api/responses.py
datastore/services/streaming.py
tests/test_streaming.py

📝 Walkthrough

Walkthrough

This PR extends JSON serialization across the datastore to handle BigQuery scalar types (Decimal for NUMERIC/BIGNUMERIC and bytes for BYTES). Custom orjson handlers convert Decimal to strings for precision and bytes to base64-encoded ASCII for UTF-8 safety. Changes appear in API responses, streaming row writers, and are validated by new regression tests.

Changes

JSON Serialization for Non-Native Types

Layer / File(s)	Summary
API Response Serialization Handler `datastore/api/responses.py`	Extends `_orjson_default` to stringify `Decimal` and base64-encode `bytes` for API response payloads, preserving numeric precision and ensuring UTF-8 compatibility.
Streaming Service Serialization with Tests `datastore/services/streaming.py`, `tests/test_streaming.py`	Introduces `_json_default` helper for streaming row encoders (both object and list formats) to handle `Decimal` and `bytes`. Tests verify correct JSON encoding for both record types, base64 representation of bytes, and error handling for unsupported types.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through data streams so bright,
With decimals and bytes encoded just right—
Base64 sprinkled, precision preserved,
JSON flows smoothly, exactly deserved! 🐰✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/decimal-stream-serialization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sagargg merged commit dd0995a into main Jun 3, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: serialise BigQuery NUMERIC / BYTES in streaming responses#8

Fix: serialise BigQuery NUMERIC / BYTES in streaming responses#8
sagargg merged 1 commit into
mainfrom
fix/decimal-stream-serialization

sagargg commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sagargg commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The bug

Fix

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sagargg commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading