Skip to content

Separate type-checking, partial evaluation and ANF into Core-to-Core phases#753

Merged
MikaelMayer merged 195 commits into
mainfrom
issue-749-partial-evaluation-deduplication-could-b
May 1, 2026
Merged

Separate type-checking, partial evaluation and ANF into Core-to-Core phases#753
MikaelMayer merged 195 commits into
mainfrom
issue-749-partial-evaluation-deduplication-could-b

Conversation

@MikaelMayer
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer commented Apr 3, 2026

Summary

Resolves #749. Separates the verification pipeline into distinct Core-to-Core phases: type checking, symbolic evaluation, and ANF encoding. Proof obligation extraction is a separate step on the final Core tree.

Problem

On main, the partial evaluator, type checker, and ANF deduplication are interleaved in a single evaluation pass. The ANF (common subexpression elimination) lives inside the SMT encoder as define-fun emission. This makes each concern harder to reason about, test, and extend independently.

Solution

The pipeline is now:

transforms → typeCheck → toCoreProofObligationProgram → ANFEncoder → ObligationExtraction → SMT Encoder
  • Type checking and symbolic evaluation are proper PipelinePhase entries in the pipeline loop, with phase-specific error messages and source location propagation.
  • Symbolic evaluation (toCoreProofObligationProgram) produces an obligations program: a Core program where each procedure body is a tree of assume/assert/if * blocks representing proof obligations, with function declarations and axioms inlined. Path conditions are emitted in logical order: spec-level preconditions first, then branch conditions (achieved by reversing the path condition stack before flattening). The obligation procedure uses the first procedure name as a representative label; downstream phases (ObligationExtraction) walk the body and ignore the header name.
  • ANF encoding is a Core-to-Core transformation that extracts common subexpressions into var declarations using hash-based duplicate detection. Expression hashing uses shared per-constructor hash combiners (hashConst, hashBVar, hashFVar, hashOp, hashApp, hashIte, hashEqExpr, hashAbs, hashQuantExpr, hashOptTy) defined once in LExpr.lean and reused by hashExpr, replaceExprs, and collectSubexprHashes. Both replaceExprs and collectSubexprHashes compute hashes bottom-up, reusing child hashes from recursive calls to avoid redundant traversals. LConst derives Hashable so constants are hashed structurally rather than via toString. The phase returns false when no deduplication targets are found, enabling fixed-point convergence.
  • Obligation extraction is a simple linear pass that extracts proof obligations with reconstructed path conditions. It accepts only the minimal Core subset produced by the PE and ANF phases: assume, assert, cover, var declarations, and nondet if *. Any other statement type produces an error. Path conditions can contain variable definitions (including ANF-generated ones).
  • SMT encoder emits define-fun for ANF variable definitions (path condition entries starting with $__anf.), treating them as macro expansions rather than equality constraints. ANF variables are filtered from UF declarations, get-value ids, and model variable lists.

The pipeline error type is DiagnosticModel (not String), so source locations propagate through all phases.

API changes

  • buildEvalEnv and buildSMTEnv combined into buildEnv with registerCustomFunctions : Bool parameter
  • symbolicEval renamed to toCoreProofObligationProgram
  • typeCheckAndEval restored to its original signature returning (List Env) × Statistics; the new typeCheckAndBuildObligationProgram returns Program × Statistics
  • toSMTTermString renamed to toSMTCommandsWithAssert
  • Statement.collectExprs simplified to return expressions directly (no collect parameter)
  • ProofObligation.toSMTTerms now returns ANF definitions separately as ANFDefinition structs
  • encodeCore accepts an anfDefinitions parameter for define-fun emission
  • Statement.mapExprs refactored to delegate structural recursion to Stmt.mapExpr/Block.mapExpr (generic over command types) and command-level mapping to Command.mapExpr, reducing duplication
  • SMT encoder builds lists in reverse and reverses at the end for O(n) list construction

Utilities

  • Stmt.mapExpr, Block.mapExpr — generic expression mapping at the imperative layer, reusable across languages.
  • Command.mapExpr — Core-specific command expression mapping.
  • mapExprs, collectExprs on Statement — exhaustive (no catch-all patterns), non-partial, with a proved identity theorem (mapExprs_id).
  • isLeaf, hasBVar, hashExpr moved to LExpr.lean.
  • Per-constructor hash combiners extracted as standalone functions in LExpr.lean, shared across hashExpr and ANF encoder traversals.
  • List.findDuplicates — generic duplicate detection utility in Strata/Util/List.lean.

Introduce a new deduplication pass that extracts common subexpressions
from procedure bodies into var declarations after partial evaluation.

The pass operates at two levels:
- Program level: walks procedure bodies, finds duplicated subexpressions,
  and hoists them into var declarations prepended to the body.
- Proof obligation level: extracts common subexpressions from a single
  proof obligation's assumptions and obligation expression.

The program-level deduplication is integrated into the verification
pipeline behind the --deduplicate flag (off by default). When enabled,
it transforms the program representation after partial evaluation,
preparing for the future separation of proof obligation emission.

New files:
- Strata/Transform/Deduplication.lean: Core deduplication logic
- StrataTest/Transform/DeduplicationTests.lean: Unit tests

Also adds deduplicateExprs option to VerifyOptions and --deduplicate
CLI flag to the verify command.
…atic list building

- Add loop statement handling in collectExprsFromStatement and mapExprsInStatement
- Extract shared findDeduplicationTargets pipeline to eliminate duplication
  between obligation-level and program-level deduplication
- Unify collectFromStatements/replaceInStatements into mapExprsInStatements
  and collectExprsFromStatements using consistent case coverage
- Fix quadratic list building: use reverse-and-reverse pattern in
  deduplicateBody and deduplicateProgram
- Fix uncurry to accumulate arguments in correct order without appending
- Make getExprType? non-partial (structurally recursive)
MikaelMayer

This comment was marked as resolved.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

…o-Core pass

Remove deduplicateObligation and its tests. The deduplication pass operates
only at the program level (deduplicateProgram), which is the correct approach:
after deduplication, proof obligation extraction becomes a simple tree traversal
collecting individual goals from if/else trees with no SMT-to-SMT optimization.
@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

…ffolding

- Remove eraseTypes from StmtsStack.push/appendToTop so the PE output
  program retains type annotations. This improves deduplication (dedup
  variables now get proper types like 'int' instead of 'α') and is a
  prerequisite for future obligation extraction from the program structure.

- Add ObligationExtraction module: a Core-to-obligations pass that walks
  a post-PE program and reconstructs proof obligations with path conditions
  from the program structure (assume statements + ITE branch conditions).
  This is scaffolding for the PE separation described in the review.

- Update test expected outputs to reflect preserved type annotations.
@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

aqjune-aws
aqjune-aws previously approved these changes Apr 30, 2026
Copy link
Copy Markdown
Contributor

@joscoh joscoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question (about procName), the other comments are not blocking but would be nice.

Comment thread Strata/Languages/Core/Core.lean
Comment thread Strata/Transform/ANFEncoder.lean Outdated
Comment thread Strata/Util/List.lean
Comment thread Strata/Languages/Core/StatementSemanticsProps.lean Outdated
@MikaelMayer

This comment has been minimized.

@MikaelMayer

This comment has been minimized.

…luation-deduplication-could-b

# Conflicts:
#	StrataTest/Languages/Core/Tests/ProgramTypeTests.lean
@MikaelMayer MikaelMayer enabled auto-merge May 1, 2026 17:20
@MikaelMayer MikaelMayer added this pull request to the merge queue May 1, 2026
Merged via the queue into main with commit f714237 May 1, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Partial evaluation + deduplication could be two phases, not built into the partial evaluator

4 participants