Skip to content

Document the Python front-end#1136

Open
tautschnig wants to merge 1 commit into
main2from
tautschnig/python-front-end-doc
Open

Document the Python front-end#1136
tautschnig wants to merge 1 commit into
main2from
tautschnig/python-front-end-doc

Conversation

@tautschnig
Copy link
Copy Markdown
Contributor

Adds two complementary documents:

  • docs/PythonFrontend.md — a narrative, human-oriented reference describing how Python source is turned into a Core program. Covers the .py → Laurel → Core → SMT/GOTO pipeline, the Any-boxed value encoding (with the datatype definition), the PySpec mechanism for external libraries such as boto3, the supported Python subset and the constructs that are rejected or collapsed, and the soundness posture (tag checks are the contract, PySpec is trusted, aliasing is a known unsoundness). It includes a 'where to look in the code' table linking to the main Lean files involved at each stage.

  • .kiro/steering/python-frontend-debugging.md — a compact playbook for agents debugging pyAnalyzeLaurel / pyAnalyzeLaurelToGoto failures. Scoped via fileMatch so it activates only when editing Strata/Languages/Python/, Strata/Languages/Laurel/, or the corresponding tests. Lists: a pipeline map correlating error prefixes with the stage that emits them; the seven invariants of the Any-boxed encoding; catalogued failure classes ('Resolution failed', 'could not infer type', 'calls to procedures in functions or contracts', 'holes should have been eliminated', 'N passed, M inconclusive', false alarms on clean benchmarks, GOTO-side issues) with where-to-look pointers; a pre-edit checklist; and the architectural limits coming from Strata's type system that should not be worked around locally.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for Strata’s Python front-end: an end-to-end narrative of the .py → Laurel → Core → SMT/GOTO pipeline plus an agent-facing debugging playbook scoped to the relevant Lean code areas.

Changes:

  • Added docs/PythonFrontend.md describing the pipeline, the Any boxing encoding, PySpec integration, supported/rejected Python constructs, and soundness assumptions.
  • Added .kiro/steering/python-frontend-debugging.md as a troubleshooting map for common pyAnalyzeLaurel / pyAnalyzeLaurelToGoto failures, with stage pointers and invariants.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
docs/PythonFrontend.md New narrative reference for how Python is translated and verified/compiled within Strata.
.kiro/steering/python-frontend-debugging.md New scoped debugging playbook for investigating Python front-end pipeline failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/PythonFrontend.md Outdated
Comment thread docs/PythonFrontend.md Outdated
Comment thread docs/PythonFrontend.md Outdated
Comment thread .kiro/steering/python-frontend-debugging.md Outdated
Adds two complementary documents:

* docs/PythonFrontend.md — a narrative, human-oriented reference
  describing how Python source is turned into a Core program.
  Covers the .py → Laurel → Core → SMT/GOTO pipeline, the Any-boxed
  value encoding (with the datatype definition), the PySpec mechanism
  for external libraries such as boto3, the supported Python subset
  and the constructs that are rejected or collapsed, and the soundness
  posture (tag checks are the contract, PySpec is trusted, aliasing
  is a known unsoundness). It includes a 'where to look in the code'
  table linking to the main Lean files involved at each stage.

* .kiro/steering/python-frontend-debugging.md — a compact playbook
  for agents debugging pyAnalyzeLaurel / pyAnalyzeLaurelToGoto
  failures. Scoped via fileMatch so it activates only when editing
  Strata/Languages/Python/**, Strata/Languages/Laurel/**, or the
  corresponding tests. Lists: a pipeline map correlating error
  prefixes with the stage that emits them; the seven invariants of
  the Any-boxed encoding; catalogued failure classes ('Resolution
  failed', 'could not infer type', 'calls to procedures in functions
  or contracts', 'holes should have been eliminated', 'N passed, M
  inconclusive', false alarms on clean benchmarks, GOTO-side issues)
  with where-to-look pointers; a pre-edit checklist; and the
  architectural limits coming from Strata's type system that should
  not be worked around locally.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the tautschnig/python-front-end-doc branch from 3c74dae to d6f474a Compare May 7, 2026 12:04
Comment thread docs/PythonFrontend.md

## Overview

The Python front-end translates annotated Python programs into Strata's Core
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programs annotated with what kind of information ?

Comment thread docs/PythonFrontend.md
verifier (`pyAnalyzeLaurel`) or compiled through to CBMC GOTO programs
(`pyAnalyzeLaurelToGoto`).

The front-end is intentionally scoped. It does not attempt to model arbitrary
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: the front-end is scoped down to a fragment of python

Comment thread docs/PythonFrontend.md
procedural code that invokes library APIs described by PySpec, raises a
handful of exception types, and avoids the more dynamic corners of the
language. Within that subset the aim is *sound* analysis — if the tool
reports "no bugs" then, modulo known specification gaps, there really are
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo known specification gaps
this introduces doubt, are these gaps in the semantics of the tool accepts as input ? If we know the gaps then why not scope down the fragment even further to reject these constructs and give unconditionally sound results ?

Comment thread docs/PythonFrontend.md
1. The pipeline from a Python `.py` file to a Core program.
2. The `Any`-boxed value encoding that underpins everything the front-end
produces.
3. The PySpec mechanism that describes external library functions.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pyspec .ion files are contracts for all external dependencies of the .py file on this diagram ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this asymetry between .py and .ion files ? why not use regular python signatures with contract annotations to specify external dependencies ?

Comment thread docs/PythonFrontend.md

* A Python `int` literal `42` becomes `from_int(42)`.
* `x + y` on Python `int`s becomes
`from_int(Any..as_int!(x) + Any..as_int!(y))`, bracketed by tag checks.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is any of that boxing/unboxing statically resolved before or during the translation to SMT or GOTO or do we generate quantifier heavy SMT encodings and let the SMT solver resolve all boxing and unboxing ?

Comment thread docs/PythonFrontend.md

Mutable containers (`list`, `dict`) are encoded as *value-semantics*
inductive datatypes (`ListAny`, `DictStrAny`), not as heap references. This
means the model does not capture aliasing: after `c = a; a.append(4)`, the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a hard limitation or do we have a path to soundness on this point ?

Comment thread docs/PythonFrontend.md
* **Tag checks are the contract.** Every place a Core expression consumes a
specific constructor (e.g. `as_int`) must be preceded by a Strata assertion
that the corresponding `isfrom_*` holds. The translator is responsible for
inserting these; a missing one is a soundness bug.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's our path to soundness here ? lean proofs about how the translator inserts tag checking assertions ?

Comment thread docs/PythonFrontend.md
postcondition, the verifier trusts it. Specs generated from upstream API
descriptions inherit the same trust.
* **Holes are sound by construction.** Each hole becomes a fresh
uninterpreted procedure; the verifier treats its outputs as arbitrary
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uninterpreted as in x = y => f(x) = f(y) ? is this really sound ? what if the external dep is stateful ?

Comment thread docs/PythonFrontend.md
uninterpreted procedure; the verifier treats its outputs as arbitrary
values consistent with their type. This is sound but often imprecise — a
typical "inconclusive" result traces back to a hole.
* **Aliasing is unsound** as noted above.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the path to soundness there ?

Comment thread docs/PythonFrontend.md
values consistent with their type. This is sound but often imprecise — a
typical "inconclusive" result traces back to a hole.
* **Aliasing is unsound** as noted above.
* **Everything else follows from Strata's Core type system**, whose
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we elaborate on what the "everything else" means here from a python semantics perspective ? does it mean that once we've reached strata core nothing else can go wrong ? i.e. any unexpected results is necessarily a bug in the upper layers ?

ssomayyajula added a commit to ssomayyajula/Strata that referenced this pull request May 8, 2026
…, strata-org#1144)

Add section explaining why existing documentation efforts (PRs strata-org#1136 and
strata-org#1144) are complementary but insufficient: they document WHAT the pipeline
does but don't specify WHEN coercions fire, WHAT constitutes valid
intermediate output, or HOW to arbitrate design disagreements.

Positions both efforts as valuable (document as-is) while distinguishing
the architecture spec (document should-be, precise enough for mechanical
implementation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ssomayyajula added a commit to ssomayyajula/Strata that referenced this pull request May 8, 2026
Reframe as "complementary, different purpose" rather than "insufficient."
Position PRs strata-org#1136/strata-org#1144 as valuable (onboarding, debugging) while
distinguishing the architecture spec's prescriptive role.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tautschnig tautschnig changed the base branch from main to main2 May 15, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants