Skip to content

feat: introduce Provenance type and migrate metadata from FileRange#1140

Merged
MikaelMayer merged 18 commits into
mainfrom
issue-1139-sourcerange-none-and-filerange-unknown-h
May 15, 2026
Merged

feat: introduce Provenance type and migrate metadata from FileRange#1140
MikaelMayer merged 18 commits into
mainfrom
issue-1139-sourcerange-none-and-filerange-unknown-h

Conversation

@MikaelMayer
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer commented May 7, 2026

Fixes #1139

Introduces a Provenance type that fully replaces FileRange and SourceRange as the canonical way to store source locations in metadata. This structurally eliminates the "SourceRange without file name" problem.

The Provenance type has two constructors:

  • Provenance.loc uri range — a real source location (always requires both URI and range)
  • Provenance.synthesized origin — a node created programmatically, with a SynthesizedOrigin inductive type

A SynthesizedOrigin inductive enforces that only canonical origins are used (smtEncode, nondetIte, laurelParse, laurel, laurelToCore, structuredToUnstructured), preventing typos and duplicates at the type level.

Key changes:

  • The .fileRange variant is removed from MetaDataElem.Value — all metadata values now use .provenance exclusively
  • MetaData.ofSourceRange emits only a provenance element
  • getProvenance is the single source of truth for reading source locations from metadata
  • setCallSiteFileRange works directly with Provenance instead of roundtripping through FileRange
  • getFileRange delegates to getProvenance for extraction
  • FileRange.unknown and SourceRange.none eliminated from metadata construction
  • SMT DDM translator uses smtAnn combinator to reduce annotation boilerplate
  • SARIF output uses getFileRange (which reads provenance)

FileRange remains as a utility struct for extraction and formatting (e.g., in DiagnosticModel), but is no longer a metadata value type.

Existing tests pass unchanged.

Follow-up

  • Migrate the B3, Boole, and Python grammar ASTs from using SourceRange as their annotation type parameter to Provenance. This would allow combining multiple files at the AST level and enable proper provenance tracking through translation passes.

Introduce a Provenance inductive type that distinguishes between real source
locations (Provenance.loc) and synthesized nodes (Provenance.synthesized).

Changes:
- Add Strata.Util.Provenance module with the Provenance type and helpers
- Add provenance variant to MetaDataElem.Value in the Imperative dialect
- Add Provenance-annotated DDM AST abbreviations (ArgProv, OperationProv, etc.)
- Add conversion helpers (ArgF.toProv, OperationF.toProv)
- Add MetaData.synthesized constructor for programmatic AST creation
- Add getProvenance helper that checks both provenance and legacy fileRange
- Update DDM→dialect translators (Core, C_Simp, Laurel, Boole, Python) to
  emit provenance metadata alongside the existing fileRange metadata
- Update MetaData.toDiagnostic to use provenance when available
- Update GOTO backend metadata conversion to preserve provenance
…Range helper

Extract the common pattern of creating metadata with both fileRange and
provenance elements into a shared MetaData.ofSourceRange helper. All
translators now use this single helper instead of duplicating the logic.

Also adds provenance to PythonToLaurel which was previously missing it.
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

All the follow-ups you filed up, do them now in this PR, and remove them from the PR description. Also, update the PR title. A PR is not an issue.

…e-none-and-filerange-unknown-h

# Conflicts:
#	Strata/DDM/AST.lean
#	Strata/DL/Imperative/MetaData.lean
#	Strata/Languages/Laurel/Grammar/ConcreteToAbstractTreeTranslator.lean
#	Strata/Languages/Python/PythonToLaurel.lean
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 Done. Completed the follow-ups in this PR:

  1. SMT DDM translator — migrated from SourceRange.none to Provenance.synthesized "smt-encode" as the annotation type
  2. MetaData.emptyMetaData.synthesized — replaced in StructuredToUnstructured, StatementEval, LaurelToCoreTranslator, ConcreteToAbstractTreeTranslator, and Laurel.lean
  3. FileRangeProvenance in metadataMetaData.ofSourceRange now emits only provenance; getFileRange checks provenance first then falls back to legacy field; setCallSiteFileRange updated to use provenance; SarifOutput.extractLocation uses getFileRange (which now reads provenance)
  4. FileRange.unknown removal — eliminated from metadata construction sites (LaurelToCoreTranslator, ConcreteToAbstractTreeTranslator, Laurel.lean, TypeHierarchy)

The DDM abbreviation rename (ArgArgSR) was not done because the primary DDM use case is parsed source (which has SourceRange). The ArgProv abbreviations already exist for downstream use. SourceRange.none and FileRange.unknown remain only in tests, examples, and the DiagnosticModel default infrastructure.

PR title and description updated.

@MikaelMayer MikaelMayer changed the title SourceRange.none and FileRange.unknown hide where statements are created feat: introduce Provenance type and migrate metadata from FileRange May 7, 2026
@github-actions github-actions Bot added the SMT label May 7, 2026
When setCallSiteFileRange encounters a check with no file range but an
existing synthesized provenance, it must erase the old provenance before
pushing the call site's provenance. Otherwise findElem returns the stale
synthesized entry, causing 'unknown location' in verification output for
synthesized __init__ preconditions.
Comment thread Strata/DL/Imperative/MetaData.lean
Comment thread Strata/Util/Provenance.lean
…e exclusively

The .fileRange variant is removed from MetaDataElem.Value. All source
locations in metadata are now stored via .provenance (Provenance.ofFileRange ...).
This makes Provenance.loc the canonical replacement for standalone FileRange
in metadata values.
Copy link
Copy Markdown
Contributor Author

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Clean migration from FileRange to Provenance as the canonical metadata value. The Provenance type is well-designed with clear semantics. The SMT translator cleanup (replacing repeated let srnone := SourceRange.none with a single abbrev ann) is a nice improvement. A few items below.

Comment thread Strata/DDM/AST.lean Outdated
Comment thread Strata/DL/Imperative/MetaData.lean Outdated
Comment thread Strata/DL/Imperative/MetaData.lean Outdated
Comment thread Strata/Languages/Laurel/Grammar/ConcreteToAbstractTreeTranslator.lean Outdated
Comment thread Strata/Languages/Python/PythonToLaurel.lean Outdated
Comment thread Strata/Languages/Laurel/LaurelToCoreTranslator.lean Outdated
Comment thread Strata/Languages/Python/PythonToLaurel.lean
Comment thread Strata/Transform/StructuredToUnstructured.lean Outdated
@MikaelMayer MikaelMayer changed the base branch from main to issue-1065-extract-core-expression-metadata-from-pr May 8, 2026 15:26
@MikaelMayer MikaelMayer changed the base branch from issue-1065-extract-core-expression-metadata-from-pr to main May 8, 2026 15:37
@github-actions github-actions Bot added the github_actions Pull requests that update GitHub Actions code label May 8, 2026
@MikaelMayer MikaelMayer force-pushed the issue-1139-sourcerange-none-and-filerange-unknown-h branch from 0571c6a to c2f9fe1 Compare May 8, 2026 15:44
@github-actions github-actions Bot removed the github_actions Pull requests that update GitHub Actions code label May 8, 2026
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 The original issue is addressed by replacing all SourceRange and FileRange usage in metadata with the Provenance type. Previously, places that had no source location used SourceRange.none or FileRange.unknown — now they use Provenance.synthesized origin with a typed SynthesizedOrigin inductive, making it impossible to have a SourceRange without a file name. The Provenance.loc constructor always requires both a Uri and a SourceRange, so the "SourceRange without file" case is structurally eliminated.

@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 The original issue is addressed by replacing all SourceRange and FileRange usage in metadata with the Provenance type. Previously, places that had no source location used SourceRange.none or FileRange.unknown — now they use Provenance.synthesized origin with a typed SynthesizedOrigin inductive, making it impossible to have a SourceRange without a file name. The Provenance.loc constructor always requires both a Uri and a SourceRange, so the "SourceRange without file" case is structurally eliminated.

I think my question deserves a deeper answer. Either something in this PR or a claim that the issue was not well informed. We see clearly in the PR diff that we eliminated FileRange and replaced it with Provenance that just adds a synthesized case. There is no mention of making sure that the dialect that use SourceRange as metadata (if any?) now use Provenance.

joscoh
joscoh previously approved these changes May 8, 2026
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 To clarify the scope: the DDM grammar ASTs (B3, Boole, Python, SMT response parsing) still use SourceRange as their annotation type parameter. This is correct — those are raw byte-position annotations produced by the parser, and they always have valid positions (the parser knows where each node starts/ends).

The issue was specifically about the imperative metadata layer (MetaDataElem.Value) where SourceRange.none and FileRange.unknown were used as sentinel values meaning "no real location." This PR eliminates those sentinels by making Provenance the metadata type:

  • Provenance.loc uri range — always has both file and range (no more orphan SourceRanges)
  • Provenance.synthesized origin — explicitly marks nodes with no source location

The DDM grammar annotations don't need this treatment because they don't suffer from the "SourceRange without file" problem — the file is always known at parse time and gets combined with the SourceRange when constructing metadata (via MetaData.ofSourceRange uri sr).

@MikaelMayer
Copy link
Copy Markdown
Contributor Author

To clarify the scope: the DDM grammar ASTs (B3, Boole, Python, SMT response parsing) still use SourceRange as their annotation type parameter. This is correct — those are raw byte-position annotations produced by the parser, and they always have valid positions (the parser knows where each node starts/ends).

The issue was specifically about the imperative metadata layer (MetaDataElem.Value) where SourceRange.none and FileRange.unknown were used as sentinel values meaning "no real location." This PR eliminates those sentinels by making Provenance the metadata type:

  • Provenance.loc uri range — always has both file and range (no more orphan SourceRanges)
  • Provenance.synthesized origin — explicitly marks nodes with no source location

The DDM grammar annotations don't need this treatment because they don't suffer from the "SourceRange without file" problem — the file is always known at parse time and gets combined with the SourceRange when constructing metadata (via MetaData.ofSourceRange uri sr).

Better answer thanks. The B3 AST SHOULD use Provenance, otherwise we wouldn't be able to translate anything to it.
the Boole AST SHOULD use provenance, otherwise we wouldn't be able to combine multiple files at the Bool level.
Same for Python.
If you think this is a follow-up, please add a follow-up section in the PR and a bullet point about it.

@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 Added a follow-up section to the PR description about migrating B3, Boole, and Python grammar ASTs from SourceRange to Provenance as their annotation type parameter.

joehendrix
joehendrix previously approved these changes May 8, 2026
Copy link
Copy Markdown
Contributor

@joehendrix joehendrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, and I'd recommend auditing all uses of getFileRange, but I'm not sure if this should be blocking.

Comment thread Strata/DL/Imperative/MetaData.lean Outdated
Comment thread Strata/DL/SMT/DDMTransform/Translate.lean Outdated
@MikaelMayer MikaelMayer dismissed stale reviews from joehendrix and joscoh via c084fbf May 8, 2026 20:56
Comment thread Strata/Util/Provenance.lean
@MikaelMayer MikaelMayer added this pull request to the merge queue May 15, 2026
Merged via the queue into main with commit 50b0e12 May 15, 2026
22 checks passed
@MikaelMayer MikaelMayer deleted the issue-1139-sourcerange-none-and-filerange-unknown-h branch May 15, 2026 21:37
PROgram52bc pushed a commit that referenced this pull request May 15, 2026
Conflicts:
- Strata/Transform/StructuredToUnstructured.lean: keep branch's
  loop-contract metadata (specLoopInvariant/specDecreases pushed to the
  transfer) and exit metadata propagation, while adopting main's
  String-only exit label and synthesizedMd-tagged synthesized commands.
- Strata/Transform/ProcBodyVerifyCorrect.lean: adopt main's new
  Config.block (Option String × σ_parent × inner) shape, but keep the
  branch's destructured `ss : List Statement` (from procToVerifyStmt_is_structured)
  instead of `proc.body : Procedure.Body`.

Follow-on fixes:
- ProcBodyVerifyCorrect.lean lines 856/859/862: replace `proc.body` with
  `ss` so the new wf-preservation lemmas (core_wfVar_preserved,
  core_wfCong_preserved, core_wfExprCongr_preserved) typecheck.
- Loops.lean / Exit.lean snapshots: replace `[fileRange]` tag with
  `[provenance]` and update shifted byte offsets to match the new
  Provenance metadata format introduced by #1140.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SourceRange.none and FileRange.unknown hide where statements are created

4 participants