Skip to content

Fix/import issue#1240

Open
bplatz wants to merge 4 commits into
mainfrom
fix/import-issue
Open

Fix/import issue#1240
bplatz wants to merge 4 commits into
mainfrom
fix/import-issue

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented May 13, 2026

Fixes incorrect datatype labels in per-class stats (stats.classes[*].properties[*].types).

The class stats collector stores datatype buckets as ValueTypeTag values, but the stats builder was treating those values as DatatypeDictId indices. This caused initial imports to report literal property types as UNKNOWN, and rebuilds to report deterministic but wrong labels like @id, xsd:boolean, or rdf:langString.

This PR keeps the class stats path consistently in the ValueTypeTag domain and removes the now-misleading dt_tags lookup. It also adds missing unambiguous OType mappings for xsd:gYearMonth and xsd:gMonthDay, while intentionally leaving ambiguous carriers like NUM_BIG_OVERFLOW as UNKNOWN until they can be disambiguated with arena access or preserved semantic datatype tags.

Test Plan

  • Added regression coverage for per-class datatype stats across import/reindex paths.
  • Verified literal properties report the expected ValueTypeTag values, including string, integer, boolean, dateTime, date, gYearMonth, and @id references.

bplatz added 2 commits May 13, 2026 11:26
…ld_class_stat_entries domain mismatch).

The SPOT-merge collector stores ValueTypeTag values into prop_dts (it only
has the per-flake o_type byte, not the dict). The consumers were treating
those keys as DatatypeDictId indices into dt_tags, which produced wrong
tags post-reindex (xsd:integer→xsd:boolean, xsd:date→rdf:langString, …)
and UNKNOWN on the import path (which passed &[] for dt_tags).

Drop the dt_tags parameter from build_class_stat_entries and
build_class_stats_json; the stored u16 is already a ValueTypeTag value,
so cast directly to u8. Update the SpotClassStats doc to reflect the
correct domain. Adds a regression test asserting xsd:string,
xsd:integer, xsd:boolean, xsd:dateTime, xsd:date, and IRI-ref tags after
reindex.
These were the only unambiguous OType→ValueTypeTag arms missing from the
stats datatype mapping; previously class stats reported them as UNKNOWN.

NUM_BIG_OVERFLOW is intentionally left unmapped: it carries both
arbitrary-precision xsd:decimal and i64-overflow xsd:integer (both share
ObjKind::NUM_BIG), and the SPOT-merge RunRecordV2 stream doesn't carry
the dt sid needed to disambiguate. Comment in id_hook.rs explains what
plumbing would be required to fix this faithfully.

Extends reindex_class_stats_report_correct_datatypes to assert both new
arms.
@bplatz bplatz requested review from aaj3f and zonotope May 13, 2026 15:38
bplatz added 2 commits May 13, 2026 12:13
The import path built the FIR6 root inline with `IndexStats.size = 0`,
even though `total_commit_size` was correctly tracked and stored on the
root. The normal indexing path runs a size-propagation block in
`root_assembly` that copies `root.total_commit_size` into `stats.size`
and proportionally allocates per-graph sizes by flake count, but the
import path skipped it. Result: `info` reported `size: 0` for the
ledger, every named graph, and stats.size after a streaming import,
while the same data landing via the normal commit path showed correct
sizes.

Factors the 18-line size-distribution block into
`IndexStats::distribute_total_size_by_flakes` and uses it from
root_assembly, both incremental sites, and the new import site.

Regression test in `import_collects_stats` asserts `stats.size > 0`
and per-graph `graphs[0].size > 0` after import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant