fix: use utf-8 for local tracking text files by Ghraven · Pull Request #790 · apache/burr

Ghraven · 2026-05-26T01:28:18Z

Summary

Thanks for maintaining Burr. This PR makes the local tracking backend text-file reads and writes use explicit UTF-8 encoding.

What changed

Adds encoding="utf-8" to annotation JSONL reads and writes in LocalBackend
Adds encoding="utf-8" when reading children.jsonl
Leaves binary log/metadata reads unchanged

Before / after

Before, these JSONL text files used the platform default encoding, which can vary on Windows and other non-UTF-8 locale environments.

After, the local backend reads and writes these text files consistently as UTF-8.

Verification

python -m py_compile burr/tracking/server/backend.py
git diff --check

andreahlert

Thanks for the fix. One spot in the same file looks missed: backend.py:524 reads graph.json text-mode without encoding="utf-8". Same class of bug, and client writes it as UTF-8 (client.py:420).

async with aiofiles.open(graph_file, encoding="utf-8") as f:

Could you include that line in this PR as well?

Ghraven · 2026-05-27T08:59:21Z

Thank you for catching that missed spot. I added the same explicit encoding="utf-8" to the graph.json read in LocalBackend and pushed the update. I verified it with python -m py_compile burr/tracking/server/backend.py and git diff --check.

skrawcz · 2026-05-28T04:52:45Z

This looks like the right fix, but since this is correcting platform-dependent file encoding behavior, I think it would be worth adding a regression test that writes and reads non-ASCII content through LocalBackend (annotations / graph / children) to prove UTF-8 handling end-to-end and prevent missed call sites. Could you add that please?

Ghraven · 2026-05-29T08:25:14Z

Thank you for the suggestion. I added a focused regression test that writes non-ASCII content through LocalBackend annotations, graph.json, and children.jsonl, then reads it back through the backend APIs.\n\nVerified locally with:\n\n\npython -m pytest tests\\tracking\\test_local_tracking_client.py::test_local_backend_reads_utf8_annotations_graph_and_children -q\npython -m py_compile burr\\tracking\\server\\backend.py tests\\tracking\\test_local_tracking_client.py\ngit diff --check\n\n\nThe focused pytest passes; the remaining warnings are existing Pydantic deprecation warnings in the backend path.

Harden the local-backend UTF-8 regression test so it catches missed call sites independently of the host's default encoding. A round-trip assertion alone passes on a UTF-8 host even if a text open drops encoding="utf-8". Wrap the backend's aiofiles.open and assert every text-mode (non-binary) open is explicitly UTF-8; binary opens (mode "rb") stay exempt. Also exercise non-ASCII (incl. CJK) content through annotations, graph.json and children.jsonl end to end. Signed-off-by: André Ahlert <andre@aex.partners>

andreahlert

LGTM. Thanks for adding the test and catching the graph.json spot.

I pushed one small follow-up commit (b142647) to harden the regression test. The round-trip assertion alone passes on a UTF-8 default host even if a text open drops encoding="utf-8", so it would not actually catch a future missed call site on CI. The follow-up wraps the backend's aiofiles.open and asserts every text-mode open is explicitly UTF-8 (binary rb opens stay exempt), which fails deterministically regardless of the host locale. Verified it goes red when the encoding arg is removed from any read.

Merging once CI is green.

The new regression test imports burr.tracking.server.backend, which pulls aiofiles (tracking-server extra). The base test job only installed tracking-client, so the top-level server import broke collection and failed all test (3.9-3.12) jobs. - Make the server backend/schema imports lazy and gate the test with pytest.importorskip("aiofiles"), matching the tracking-server-s3 test idiom, so the module still collects with only tracking-client present. - Add the tracking-server extra to the base test job so aiofiles is available and the regression test actually runs across the matrix instead of being skipped everywhere. Signed-off-by: André Ahlert <andre@aex.partners>

fix: use utf-8 for local tracking text files

3a9e785

github-actions Bot added the area/tracking Telemetry, tracing, OpenTelemetry label May 26, 2026

andreahlert assigned Ghraven May 26, 2026

andreahlert added the kind/bug Something is broken label May 26, 2026

andreahlert requested changes May 26, 2026

View reviewed changes

andreahlert mentioned this pull request May 26, 2026

fix: use utf-8 for test case json files #789

Open

fix: read local tracking graph as utf-8

6583711

test: cover local backend utf-8 reads

53c4365

andreahlert approved these changes May 29, 2026

View reviewed changes

github-actions Bot added the area/ci Workflows, build, release scripts label May 29, 2026

andreahlert merged commit 855bea6 into apache:main May 29, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use utf-8 for local tracking text files#790

fix: use utf-8 for local tracking text files#790
andreahlert merged 5 commits into
apache:mainfrom
Ghraven:fix/local-backend-annotation-utf8

Ghraven commented May 26, 2026

Uh oh!

andreahlert left a comment

Uh oh!

Ghraven commented May 27, 2026

Uh oh!

skrawcz commented May 28, 2026

Uh oh!

Ghraven commented May 29, 2026

Uh oh!

andreahlert left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ghraven commented May 26, 2026

Summary

What changed

Before / after

Verification

Uh oh!

andreahlert left a comment

Choose a reason for hiding this comment

Uh oh!

Ghraven commented May 27, 2026

Uh oh!

skrawcz commented May 28, 2026

Uh oh!

Ghraven commented May 29, 2026

Uh oh!

andreahlert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants