Separate type-checking, partial evaluation and ANF into Core-to-Core phases#753
Merged
MikaelMayer merged 195 commits intoMay 1, 2026
Merged
Conversation
Introduce a new deduplication pass that extracts common subexpressions from procedure bodies into var declarations after partial evaluation. The pass operates at two levels: - Program level: walks procedure bodies, finds duplicated subexpressions, and hoists them into var declarations prepended to the body. - Proof obligation level: extracts common subexpressions from a single proof obligation's assumptions and obligation expression. The program-level deduplication is integrated into the verification pipeline behind the --deduplicate flag (off by default). When enabled, it transforms the program representation after partial evaluation, preparing for the future separation of proof obligation emission. New files: - Strata/Transform/Deduplication.lean: Core deduplication logic - StrataTest/Transform/DeduplicationTests.lean: Unit tests Also adds deduplicateExprs option to VerifyOptions and --deduplicate CLI flag to the verify command.
…atic list building - Add loop statement handling in collectExprsFromStatement and mapExprsInStatement - Extract shared findDeduplicationTargets pipeline to eliminate duplication between obligation-level and program-level deduplication - Unify collectFromStatements/replaceInStatements into mapExprsInStatements and collectExprsFromStatements using consistent case coverage - Fix quadratic list building: use reverse-and-reverse pattern in deduplicateBody and deduplicateProgram - Fix uncurry to accumulate arguments in correct order without appending - Make getExprType? non-partial (structurally recursive)
…ould-b # Conflicts: # Strata.lean
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…o-Core pass Remove deduplicateObligation and its tests. The deduplication pass operates only at the program level (deduplicateProgram), which is the correct approach: after deduplication, proof obligation extraction becomes a simple tree traversal collecting individual goals from if/else trees with no SMT-to-SMT optimization.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…ffolding - Remove eraseTypes from StmtsStack.push/appendToTop so the PE output program retains type annotations. This improves deduplication (dedup variables now get proper types like 'int' instead of 'α') and is a prerequisite for future obligation extraction from the program structure. - Add ObligationExtraction module: a Core-to-obligations pass that walks a post-PE program and reconstructs proof obligations with path conditions from the program structure (assume statements + ITE branch conditions). This is scaffolding for the PE separation described in the review. - Update test expected outputs to reflect preserved type annotations.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
aqjune-aws
previously approved these changes
Apr 30, 2026
…luation-deduplication-could-b
joscoh
reviewed
May 1, 2026
Contributor
joscoh
left a comment
There was a problem hiding this comment.
One question (about procName), the other comments are not blocking but would be nice.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…luation-deduplication-could-b # Conflicts: # StrataTest/Languages/Core/Tests/ProgramTypeTests.lean
joscoh
approved these changes
May 1, 2026
aqjune-aws
approved these changes
May 1, 2026
…luation-deduplication-could-b
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #749. Separates the verification pipeline into distinct Core-to-Core phases: type checking, symbolic evaluation, and ANF encoding. Proof obligation extraction is a separate step on the final Core tree.
Problem
On main, the partial evaluator, type checker, and ANF deduplication are interleaved in a single evaluation pass. The ANF (common subexpression elimination) lives inside the SMT encoder as
define-funemission. This makes each concern harder to reason about, test, and extend independently.Solution
The pipeline is now:
PipelinePhaseentries in the pipeline loop, with phase-specific error messages and source location propagation.toCoreProofObligationProgram) produces an obligations program: a Core program where each procedure body is a tree ofassume/assert/if *blocks representing proof obligations, with function declarations and axioms inlined. Path conditions are emitted in logical order: spec-level preconditions first, then branch conditions (achieved by reversing the path condition stack before flattening). The obligation procedure uses the first procedure name as a representative label; downstream phases (ObligationExtraction) walk the body and ignore the header name.vardeclarations using hash-based duplicate detection. Expression hashing uses shared per-constructor hash combiners (hashConst,hashBVar,hashFVar,hashOp,hashApp,hashIte,hashEqExpr,hashAbs,hashQuantExpr,hashOptTy) defined once inLExpr.leanand reused byhashExpr,replaceExprs, andcollectSubexprHashes. BothreplaceExprsandcollectSubexprHashescompute hashes bottom-up, reusing child hashes from recursive calls to avoid redundant traversals.LConstderivesHashableso constants are hashed structurally rather than viatoString. The phase returnsfalsewhen no deduplication targets are found, enabling fixed-point convergence.assume,assert,cover,vardeclarations, and nondetif *. Any other statement type produces an error. Path conditions can contain variable definitions (including ANF-generated ones).define-funfor ANF variable definitions (path condition entries starting with$__anf.), treating them as macro expansions rather than equality constraints. ANF variables are filtered from UF declarations,get-valueids, and model variable lists.The pipeline error type is
DiagnosticModel(notString), so source locations propagate through all phases.API changes
buildEvalEnvandbuildSMTEnvcombined intobuildEnvwithregisterCustomFunctions : BoolparametersymbolicEvalrenamed totoCoreProofObligationProgramtypeCheckAndEvalrestored to its original signature returning(List Env) × Statistics; the newtypeCheckAndBuildObligationProgramreturnsProgram × StatisticstoSMTTermStringrenamed totoSMTCommandsWithAssertStatement.collectExprssimplified to return expressions directly (nocollectparameter)ProofObligation.toSMTTermsnow returns ANF definitions separately asANFDefinitionstructsencodeCoreaccepts ananfDefinitionsparameter fordefine-funemissionStatement.mapExprsrefactored to delegate structural recursion toStmt.mapExpr/Block.mapExpr(generic over command types) and command-level mapping toCommand.mapExpr, reducing duplicationUtilities
Stmt.mapExpr,Block.mapExpr— generic expression mapping at the imperative layer, reusable across languages.Command.mapExpr— Core-specific command expression mapping.mapExprs,collectExprsonStatement— exhaustive (no catch-all patterns), non-partial, with a proved identity theorem (mapExprs_id).isLeaf,hasBVar,hashExprmoved toLExpr.lean.LExpr.lean, shared acrosshashExprand ANF encoder traversals.List.findDuplicates— generic duplicate detection utility inStrata/Util/List.lean.