fix: use utf-8 for local tracking text files#790
Conversation
andreahlert
left a comment
There was a problem hiding this comment.
Thanks for the fix. One spot in the same file looks missed: backend.py:524 reads graph.json text-mode without encoding="utf-8". Same class of bug, and client writes it as UTF-8 (client.py:420).
async with aiofiles.open(graph_file, encoding="utf-8") as f:Could you include that line in this PR as well?
|
Thank you for catching that missed spot. I added the same explicit |
|
This looks like the right fix, but since this is correcting platform-dependent file encoding behavior, I think it would be worth adding a regression test that writes and reads non-ASCII content through LocalBackend (annotations / graph / children) to prove UTF-8 handling end-to-end and prevent missed call sites. Could you add that please? |
|
Thank you for the suggestion. I added a focused regression test that writes non-ASCII content through LocalBackend annotations, graph.json, and children.jsonl, then reads it back through the backend APIs.\n\nVerified locally with:\n\n |
Harden the local-backend UTF-8 regression test so it catches missed call sites independently of the host's default encoding. A round-trip assertion alone passes on a UTF-8 host even if a text open drops encoding="utf-8". Wrap the backend's aiofiles.open and assert every text-mode (non-binary) open is explicitly UTF-8; binary opens (mode "rb") stay exempt. Also exercise non-ASCII (incl. CJK) content through annotations, graph.json and children.jsonl end to end. Signed-off-by: André Ahlert <andre@aex.partners>
andreahlert
left a comment
There was a problem hiding this comment.
LGTM. Thanks for adding the test and catching the graph.json spot.
I pushed one small follow-up commit (b142647) to harden the regression test. The round-trip assertion alone passes on a UTF-8 default host even if a text open drops encoding="utf-8", so it would not actually catch a future missed call site on CI. The follow-up wraps the backend's aiofiles.open and asserts every text-mode open is explicitly UTF-8 (binary rb opens stay exempt), which fails deterministically regardless of the host locale. Verified it goes red when the encoding arg is removed from any read.
Merging once CI is green.
The new regression test imports burr.tracking.server.backend, which
pulls aiofiles (tracking-server extra). The base test job only installed
tracking-client, so the top-level server import broke collection and
failed all test (3.9-3.12) jobs.
- Make the server backend/schema imports lazy and gate the test with
pytest.importorskip("aiofiles"), matching the tracking-server-s3 test
idiom, so the module still collects with only tracking-client present.
- Add the tracking-server extra to the base test job so aiofiles is
available and the regression test actually runs across the matrix
instead of being skipped everywhere.
Signed-off-by: André Ahlert <andre@aex.partners>
Summary
Thanks for maintaining Burr. This PR makes the local tracking backend text-file reads and writes use explicit UTF-8 encoding.
What changed
encoding="utf-8"to annotation JSONL reads and writes inLocalBackendencoding="utf-8"when readingchildren.jsonlBefore / after
Before, these JSONL text files used the platform default encoding, which can vary on Windows and other non-UTF-8 locale environments.
After, the local backend reads and writes these text files consistently as UTF-8.
Verification
python -m py_compile burr/tracking/server/backend.pygit diff --check