Skip to content

feat: deep source provenance, inline query rules, and public env API#4

Open
theaspirational wants to merge 3 commits intoRayforceDB:masterfrom
theaspirational:feature/datalog-provenance
Open

feat: deep source provenance, inline query rules, and public env API#4
theaspirational wants to merge 3 commits intoRayforceDB:masterfrom
theaspirational:feature/datalog-provenance

Conversation

@theaspirational
Copy link
Copy Markdown

@theaspirational theaspirational commented Apr 7, 2026

Summary

Three related additions to the Datalog engine, each independently useful and backward compatible.


1. Deep source provenance (dl_get_provenance_src_offsets / dl_get_provenance_src_data)

Replaces the two stub implementations with a full CSR-format source tracking pass that runs after the existing rule-attribution provenance (prov_col).

When DL_FLAG_PROVENANCE is set, for each derived row the engine records which rows in which body relations contributed to the derivation:

Field on dl_rel_t Type Meaning
prov_src_offsets I64[nrows+1] CSR offsets: offsets[i] = start index in prov_src_data for derived row i
prov_src_data I64[total] packed source refs: (relation_index << 32) | row_index

CSR format keeps the result in two flat allocations per relation, lets callers slice out sources for any row in O(1), and mirrors the encoding already used by the engine's CSR edge indices.

Caveat: for rules where a variable appears only in body atoms, that variable is unconstrained during source lookup. Entries may be a superset of the true derivation (all body rows whose head-visible columns match). This is documented in the header. False negatives would be worse.

Both vectors are released in dl_program_free.


2. Inline query rules via (rules ...) clause

Adds an optional fourth argument to (query ...) that supplies rules inline, Datomic-style, instead of using the global rule set:

;; Existing behaviour unchanged — uses global g_dl_rules[]
(query db (find ?x ?y) (where (path ?x ?y)))

;; New: inline rules, globals ignored for this query
(query db
  (find ?x ?y)
  (where (path ?x ?y))
  (rules
    ((path ?x ?y) (edge ?x ?y))
    ((path ?x ?z) (edge ?x ?y) (path ?y ?z))))

Each entry in (rules ...) follows the same head+body syntax as (rule ...). When the clause is present only those rules are loaded into the temporary dl_program_t; when absent the existing global-copy path runs unchanged.

This enables multi-database sessions where different queries carry different rule sets without global state pollution, and makes rule sets composable as plain data (lists of lists).

Internally, dl_parse_rule_from_head_and_body is extracted as a shared helper used by both ray_rule_fn and the new inline parse path, removing the duplication.


3. Promote ray_env_get / ray_env_set to public API

Adds two declarations to include/rayforce.h:

/* ===== Environment API ===== */
ray_t*    ray_env_get(int64_t sym_id);
ray_err_t ray_env_set(int64_t sym_id, ray_t* val);

Both functions already exist and are widely used internally (src/lang/env.c). Exposing them allows embedders to bind named values into the evaluation environment and retrieve them after eval — the natural way to pass tables into Rayfall queries by name and read results back. Without this, embedders have to link against the internal env.h header directly.


Changes

File What changed
src/ops/datalog.h Two new fields on dl_rel_t; stub comments replaced with full docs for both getters
src/ops/datalog.c dl_build_source_prov() (new); dl_parse_rule_from_head_and_body() (extracted helper); inline rules parsing in ray_query_fn; real getter implementations; dl_program_free cleanup
include/rayforce.h ray_env_get / ray_env_set declarations
test/test_datalog.c New file: two tests for CSR structure and flag-guard behaviour
test/test_main.c Registers the new /datalog suite

All 575 tests pass.

theaspirational and others added 3 commits April 7, 2026 19:11
Replaces the two stub implementations of dl_get_provenance_src_offsets
and dl_get_provenance_src_data with a full CSR-format source tracking
pass that runs after the existing rule-attribution provenance.

For each derived (IDB) row, the engine now records which rows in
which body relations contributed to the derivation, stored as two
parallel vectors on dl_rel_t:

  prov_src_offsets — I64[nrows+1] in CSR format: offsets[i] is the
                     start index in prov_src_data for derived row i.
  prov_src_data    — flat I64 vector of packed source references,
                     each entry = (relation_index << 32) | row_index.

This gives callers a complete derivation trace without materialising
intermediate proof trees. The encoding is self-contained: relation
indices refer to prog->rels[], making the result portable alongside
the program.

Caveats:
- Body-only variables (variables that appear in body atoms but not in
  the head) are unconstrained during source lookup. Entries may be a
  superset of the true proof for such rules.
- Only populated when DL_FLAG_PROVENANCE is set.

Both vectors are released in dl_program_free.

Tests: two new cases in test/test_datalog.c verify the CSR structure
for a base-case path rule and confirm the flag-guard behaviour.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Add optional (rules ...) clause to (query ...) that passes rules
inline at query time, following the Datomic/DataScript pattern.

- Extract dl_parse_inline_rule() and dl_parse_rule_from_head_and_body()
- Parse (rules ((head ...) body1 body2 ...) ...) in ray_query_fn
- When (rules ...) present, use only inline rules (ignore globals)
- When absent, use globals (backward compatible)
- Add tests for inline rules, globals fallback, override semantics

Made-with: Cursor
These functions exist in src/lang/env.c and are stable across
upstream refactors. Promoting them to the public header enables
embedding use cases where host code needs to bind named values
into the evaluator's environment.

Made-with: Cursor
@theaspirational theaspirational changed the title feat: implement deep source provenance for Datalog derivations feat: deep source provenance, inline query rules, and public env API Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant