Skip to content

Core.formatProgram to produce round-trip-parseable output for all constructs#1165

Draft
MikaelMayer wants to merge 10 commits into
mainfrom
issue-1158-core-formatprogram-does-not-produce-roun
Draft

Core.formatProgram to produce round-trip-parseable output for all constructs#1165
MikaelMayer wants to merge 10 commits into
mainfrom
issue-1158-core-formatprogram-does-not-produce-roun

Conversation

@MikaelMayer
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer commented May 14, 2026

Fixes #1158

Summary

Fixes several issues with Core.formatProgram that prevented round-trip parsing:

  1. inline function formatted without space — The inline grammar op now includes a trailing space, preventing inlinefunction concatenation. The formatter also now emits the inline attribute when present on functions.

  2. Quantifier variable names — The formatter now uses the prettyName field from LExpr.quant instead of generating __qN names, preserving the original variable names through formatting.

  3. Overflow predicates — Added grammar entries, formatter handling, parser entries, and Factory operations for all bitvector overflow predicates (SNegOverflow, UNegOverflow, SAddOverflow, SSubOverflow, SMulOverflow, SDivOverflow, UAddOverflow, USubOverflow, UMulOverflow). SNegOverflow and UNegOverflow use distinct grammar entries (Bv.SNegOverflow / Bv.UNegOverflow) to preserve their different semantics through roundtrip. The translate direction correctly dispatches overflow ops by type parameter (bv1/bv8/bv16/bv32/bv64).

  4. Array assignment (lhsArray) — Implemented m[k] := v translation in both directions: the parser decomposes nested LHS into base identifier + indices and builds map_update expressions; the formatter detects the map_update(var, idx, val) pattern and produces lhsArray syntax.

  5. Sequence.empty — Added grammar entry with explicit type annotation syntax (Sequence.empty<int>()), plus translate and format handling, resolving the 0-ary polymorphic function limitation for this operation.

  6. Datatype roundtrip — Verified that datatype formatting roundtrips correctly (the extra-parentheses issue noted earlier is resolved).

  7. Roundtrip test infrastructure — Added RoundtripTest.lean that verifies parse → format → re-parse → re-format produces stable output for types, functions, procedures, inline functions, parameterized type arguments, datatypes, array assignments, and Sequence.empty.

Testing

  • All existing tests pass (with expected output updates for the formatting improvements)
  • New roundtrip tests verify types, functions, procedures, inline functions, parameterized type arguments, datatypes, lhsArray, and Sequence.empty

Remaining limitations

  • Bitvector operations with widths beyond 1/8/16/32/64 still fall back to generic call (requires grammar-level type entries per width)

- Fix inline function formatting: add trailing space to 'inline' op in
  Grammar.lean to prevent 'inlinefunction' concatenation
- Emit inline attribute when formatting functions that have it
- Use prettyName for quantifier variables instead of generated __qN names
- Add overflow predicate support: grammar entries, formatter handling,
  parser entries, and Factory operations for Bv32 overflow predicates
- Add roundtrip test infrastructure for Core programs
- Update test expected outputs to reflect prettyName fix
Resolve conflicts between the formatProgram roundtrip fix and main's
refactoring (FormatCore extraction, recFnDeclToCST, procedure syntax).

- Apply prettyName and inline? fixes to FormatCore.lean
- Use proper overflow CST constructors instead of approximations
- Remove duplicate overflow declarations in Factory.lean
- Update test expected outputs for combined changes
- Fix RoundtripTest to handle new 'program Core;' header
Copy link
Copy Markdown
Contributor Author

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Good PR — the prettyName preservation, inline function formatting, and overflow predicate grammar entries are well-designed. The roundtrip test infrastructure is a valuable addition.

One substantive issue: the overflow ops in Translate.lean ignore the type parameter and hardcode bv32, which breaks roundtrip for non-32-bit overflow predicates.

Minor: the docstring on mkQuantVarName (line 86 of FormatCore.lean) says "In the future, we will store existing variable names in an extra field" — but this PR implements exactly that. Worth updating to reflect the current state.

Comment thread Strata/Languages/Core/DDMTransform/Translate.lean Outdated
Comment thread Strata/Languages/Core/DDMTransform/FormatCore.lean Outdated
… grammar entry

- Overflow ops in translateFn now match on type parameter (.some .bv1,
  .some .bv8, etc.) instead of using wildcard _, consistent with all
  other type-parameterized BV ops.
- Added bv_sdiv_overflow grammar entry so SDivOverflow formats as
  Bv.SDivOverflow(x, y) instead of the incorrect x <= y.
- Added overflow ops to the polymorphic binary dispatch in translateExpr.
- Updated editor syntax files with Bv.SDivOverflow.
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

Do all the follow-ups and update the PR description when done

@MikaelMayer MikaelMayer changed the title Core.formatProgram does not produce round-trip-parseable output for all constructs Core.formatProgram to produce round-trip-parseable output for all constructs May 14, 2026
… roundtrip

- Implement lhsArray (m[k] := v) in Translate.lean by decomposing nested
  LHS into base identifier + indices and building map_update expressions
- Add decomposeMapUpdate in FormatCore.lean to detect map_update patterns
  and format them back as lhsArray syntax for proper roundtrip
- Add seq_empty grammar entry with explicit type annotation syntax
  (Sequence.empty<T>()) and corresponding translate/format handling
- Add full roundtrip test for datatypes (parentheses issue is resolved)
- Add roundtrip tests for lhsArray and Sequence.empty
- Regenerate editor syntax files with Sequence.empty
- Update AdvancedMaps expected output to match new lhsArray formatting
@MikaelMayer
Copy link
Copy Markdown
Contributor Author

🤖 Done in 4683739. Addressed the follow-ups:

  1. lhsArray (m[k] := v) — Implemented in Translate.lean (decomposes nested LHS into base + indices, builds map_update) and FormatCore.lean (detects map_update pattern to produce lhsArray syntax for roundtrip).

  2. Sequence.empty — Added grammar entry with explicit type annotation syntax (Sequence.empty<int>()), plus translate and format handling.

  3. Datatype roundtrip — The extra parentheses issue is already resolved; added a full roundtrip test confirming it works.

  4. Bitvector widths beyond 1/8/16/32/64 — This remains a known limitation requiring grammar-level changes (new type entries per width). Not addressed in this PR.

All 598 test jobs pass.

Copy link
Copy Markdown
Contributor Author

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Well-structured PR that addresses multiple formatting roundtrip issues. The roundtrip test infrastructure is a good addition. The quantifier pretty-name preservation, inline attribute emission, overflow predicate grammar entries, and lhsArray detection are all solid improvements.

A few items to address:

  • Dead code added in Factory.lean (bv32NegOverflowFunc/bv32NegOverflowOp) — the macro already generates these.
  • UNegOverflow is conflated with SNegOverflow in the formatter/parser roundtrip (they use different semantics: BitVec.negOverflow vs BitVec.unegOverflow).
  • decomposeMapUpdate docstring says "nested" but only handles single-level.

Comment thread Strata/Languages/Core/Factory.lean Outdated
def bv64Extract_15_0_Op := bv64Extract_15_0_Func.opExpr
def bv64Extract_7_0_Op := bv64Extract_7_0_Func.opExpr

def bv32NegOverflowFunc : WFLFunc CoreLParams :=
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 bv32NegOverflowFunc and bv32NegOverflowOp are dead code — they are never referenced anywhere. The macro DefBVOpFuncExprs [1, 8, 16, 32, 64] already generates bv32SNegOverflowOp (from the "SNegOverflow" spec in BVOpSpecs). These manual definitions should be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Removed bv32NegOverflowFunc and bv32NegOverflowOp in c3b37cd.

-- Overflow predicates: approximated as Bool.Not for CST printing
| .bv ⟨_, .SNegOverflow⟩ | .bv ⟨_, .UNegOverflow⟩ => pure (.not default arg)
-- Overflow predicates
| .bv ⟨w, .SNegOverflow⟩ | .bv ⟨w, .UNegOverflow⟩ =>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 SNegOverflow and UNegOverflow are mapped to the same grammar entry (bv_neg_overflow), but the parser always maps bv_neg_overflow back to Core.bv{N}SNegOverflowOp. Since these use different evaluation functions (BitVec.negOverflow vs BitVec.unegOverflow), a UNegOverflow expression loses its semantics after a roundtrip. Consider adding a separate bv_uneg_overflow grammar entry (e.g., "Bv.UNegOverflow") to preserve the distinction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Added separate bv_uneg_overflow grammar entry (Bv.UNegOverflow) and updated FormatCore to use it for UNegOverflow, while bv_neg_overflow now formats as Bv.SNegOverflow. Parser mappings added in Translate.lean. c3b37cd

modify (·.pushBoundVar name.val)
pure (.funcDecl_statement default name typeArgs b r preconds bodyExpr inline?)

/-- Decompose nested `map_update(base, idx, val)` where `base` is (or starts
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 The docstring says "Decompose nested map_update" but the implementation only handles single-level (base must be an fvar). For nested m[k1][k2] := v, this returns some ([k1], update(select(m, k1), k2, v)) which formats as m[k1] := (m[k1])[k2:=v] rather than m[k1][k2] := v. Consider updating the docstring to say "single-level" to match the actual behavior, or implementing the recursive case by checking if val is itself a map_update whose base is a select on the same variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Updated docstring to say "single-level" in c3b37cd.

…de, fix docstring

- Add separate bv_uneg_overflow grammar entry so UNegOverflow roundtrips
  correctly (previously both mapped to bv_neg_overflow losing semantics)
- Remove dead bv32NegOverflowFunc/bv32NegOverflowOp definitions
- Fix decomposeMapUpdate docstring to say single-level instead of nested
- Regenerate editor syntax files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Core.formatProgram does not produce round-trip-parseable output for all constructs

1 participant