Skip to content

OpenAgentsInc/psionic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,694 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Psionic

Psionic is a Rust-native ML and inference stack.

It owns the machine-facing execution substrate behind local inference, serving, training, distributed execution, artifact truth, and clustered compute. The project is broader than one app or one benchmark lane. It is the crate family that OpenAgents uses for inference, training, cluster bring-up, and execution evidence.

Psionic should be read hardware-first. It owns the admitted hardware strategy for each lane: backend family, residency mode, topology, serving or training role, and the capability, refusal, and evidence surfaces that higher layers consume. Upstream systems such as llama.cpp, vLLM, SGLang, MLX, and other reference repos are inputs for specific layers or hardware classes, not the identity of the shipped Psionic stack.

The training side now also carries one bounded gemma4:e4b CUDA adapter-SFT trainer above the shared adapter substrate: LM-head-only final-hidden-state supervision, frozen-base semantics, typed export, exact checkpoint resume, served-base plus tokenizer compatibility checks, and explicit refusal truth for wider Gemma regions that remain out of scope. The same bounded lane now also closes the first trainer-to-serving refresh seam: typed Gemma checkpoints plus exported adapter artifacts can be revalidated into the live CUDA mesh lane without a process restart, the active served revision is surfaced in response provenance, stale or mismatched revisions fail closed, and operators can roll back to the last known-good promoted revision. The same lane is now also eval-first: it binds one canonical held-out eval pack, one four-split dataset contract, one short baseline sweep against the untuned base, one overlap and decontam gate, one canned promoted-checkpoint vibe-review packet, and one promotion decision that refuses held-out regressions or failed operator review.

Start Here

Main Tracks

Psion Training Shortcut

If you want the current top Psion training lane instead of guessing among benchmark-adjacent lanes, run:

./TRAIN

That command now targets the actual Psion pretraining lane and materializes the retained launch, status, preflight, checkpoint, dashboard, alert, and closeout surfaces under ~/scratch/psion_actual_pretraining_runs/<run_id>.

Use:

./TRAIN --dry-run
./TRAIN resume --run-root <path>
./TRAIN status --run-root <path>
./TRAIN rehearse-base-lane --planned-interruption-step <n>

for plan inspection and operator follow-up on the actual lane.

For machine supervision, use the typed runtime surface instead of the shell wrapper:

cargo run -q -p psionic-train --bin psionic-train -- manifest --manifest <path-to-psionic.train.invocation_manifest.v1.json>

The actual CUDA lane now also has one strong-node packaging contract above that same runtime in crates/psionic-train/src/psion_actual_pretraining_launcher.rs: psion.actual_pretraining_automatic_execution_request.v1 compiles assigned actual-lane work into the generic invocation manifest, and psion.actual_pretraining_automatic_execution_outputs.v1 names the deterministic retained status, checkpoint, and window-artifact paths that execution will materialize under the run root.

That manifest now carries one frozen coordination envelope, including the admitted node_pubkey, plus one admitted release/build/environment identity that the runtime checks before launch. Recovery-source manifests can now also carry one peer_node_pubkey for the machine-only serve-checkpoint operation and one peer_checkpoint_handoff_receipt artifact binding that seeds a joiner’s local checkpoint tree before resume runs. Validator manifests are now admitted for one machine-only validate-contribution operation as well, using one validator_target_contribution_receipt artifact binding plus one validator_target_contribution_artifact_manifest artifact binding to replay one retained worker contribution artifact set into one local validator score receipt. The emitted psionic.train.status_packet.v1 packet now also carries the resolved runtime attestation and the retained absolute paths for status/psionic_train_run_status_packet.json and status/psionic_train_window_status_packet.json. When a run root exists, the runtime also persists one psionic.train.membership_revision_receipt.v1 receipt at status/membership_revision_receipt.json and appends revision history under status/membership_revisions/. That same machine contract now admits one second bounded lane, psion_apple_windowed_training_v1, for homogeneous Apple Silicon / Metal windowed training. The Apple lane uses the same invocation manifest, status packets, membership receipt, contribution artifacts, validator replay entrypoint, and peer handoff flow, but it retains generic checkpoint artifacts under checkpoints/latest_accepted_checkpoint_pointer.json plus checkpoints/manifests/checkpoint_manifest_step-<optimizer_step>.json using psionic.train.checkpoint_pointer.v1 and psionic.train.checkpoint_manifest.v1. That is intentionally narrower than the actual pretraining lane: it is one admitted machine lane for backend-homogeneous Apple windows, not a claim that the broader CUDA actual-pretraining operator contract is now portable across backend families. Apple validator replay now uses that same retained checkpoint surface, and Apple resume now fails closed unless one admitted checkpoint was already retained locally or materialized from one peer handoff receipt first. history under status/membership_revisions/ so the local worker heartbeat, drain, rejoin, replace, and failed-session posture remain machine-visible. The same machine runtime now also persists one psionic.train.checkpoint_surface.v1 snapshot at status/checkpoint_surface.json so supervisors can read the latest checkpoint pointer state, checkpoint-manifest digest, backup receipt posture, upload outcome, and auto-resume recovery result without reopening the full retained actual-lane tree. The run/window status packets repeat the absolute paths for that surface plus the latest checkpoint manifest, backup receipt, pointer, peer handoff receipt, auto-resume receipt, and validator score receipt when those artifacts exist. Validator replay retains the score surfaces under windows/<window_id>/validators/<challenge_id>/validator_score_artifact.json and validator_score_receipt.json. The machine validator contract is now also covered by focused unit tests over disposition classification and subprocess CLI tests over stale assignment, missing replay inputs, and artifact-digest drift refusals. When the admitted coordination envelope also carries window_id and assignment_id, the same machine runtime now materializes one deterministic window artifact family under windows/<window_id>/: one retained window_execution.json, one per-contribution artifact_manifest.json, one per-contribution contribution_receipt.json, and one rollup sealed_window_bundle.json. The run/window status packets repeat the absolute paths for those retained window surfaces too, so supervisors can follow one declared assignment through the local retained bundle set without re-scanning the whole run root. The same machine contract now also admits one optional grouped_stage_assignment payload for grouped replicas. When that payload is present, the manifest, run/window status packets, and retained window artifacts all carry one explicit replica_id, stage_id, stage_index, stage_count, stage_role, and canonical assignment digest so a weak-device stage is not flattened into the same contributor identity as a single-node worker window. Non-ingress grouped stages now also require one admitted grouped_stage_input_transport_path that points at a validated upstream handoff envelope, and every stage with a downstream neighbor emits one deterministic grouped_stage_output_transport.json plus grouped_stage_output_payload.json under its retained contribution root for the next stage to consume. The retained contribution root now also carries one deterministic grouped_stage_execution_summary.json that binds the grouped assignment, accepted input/output transport digests, and local outcome into one machine-legible artifact, and validator replay now emits one paired grouped_stage_replay_evidence.json when it verifies that grouped stage surface against the challenged receipt and artifact manifest. Grouped stage checkpointing now preserves that same scope through the generic checkpoint surface too: the retained checkpoint pointer and manifest carry the window_id, assignment_id, and full grouped_stage_assignment, peer handoff receipts repeat that grouped metadata, and grouped resume refuses to seed from any retained or handed-off checkpoint whose stage identity drifts from the requested worker window. Every grouped resume also writes one deterministic checkpoints/grouped_stage_recovery_receipt.json so supervisors can see whether the resumed stage came from one retained checkpoint or one peer checkpoint handoff without re-inferring that from process-local state. The machine contract now also carries one explicit work_class across the invocation manifest, final process packet, retained run/window status packets, window execution, contribution receipt, contribution artifact manifest, and sealed-window rollup. Validator manifests must declare work_class=validation_replay plus one admitted validator_target_work_class for the challenged contribution, and validator replay currently admits adapter_training, small_model_local_training, grouped_replica_stage_execution, and full_island_local_update_training targets without claiming full gradient recomputation. Retained validator score artifacts and receipts now record both the validator work class and the challenged work class plus the exact replay hooks that were verified: assignment correctness, checkpoint lineage, work-execution plausibility, update integrity, and grouped-stage integrity when the challenged work class is one grouped replica stage. Grouped-stage validator replay also fails closed as ArtifactIncomplete whenever the retained grouped_stage_execution_summary.json surface is missing, so a replacement validator cannot silently score one grouped stage without the retained transport-and-execution evidence bundle. Validator replay now also retains one deterministic validator_quality_drift_signal.json plus one paired validator_rollback_signal.json under each validator root. Those signal artifacts carry one monotonic validation_index, the previous retained score/disposition, score delta, degraded-window count, non-accepted-window count, and the latest accepted baseline window when one exists. The rollback signal does not execute rollback itself; it emits one machine-readable hold or candidate posture that later Nexus or scheduler policy can consume without re-scanning validator history, and the run/window status packets now repeat both signal paths beside the validator score receipt. That same Apple grouped-stage path now also has one narrow weak-device accepted outcome proof surface. The record_psionic_train_weak_device_accepted_outcome_proof() helper consolidates one accepted Metal grouped-stage contribution, replay evidence, quality-drift/rollback signals, and grouped checkpoint lineage into psionic.train.weak_device_accepted_outcome_proof.v1 with an explicit claim boundary that stops short of payout or network-wide finality. The subprocess CLI test apple_grouped_stage_records_weak_device_accepted_outcome_proof exercises that bundle against real retained artifacts instead of synthetic fixtures.

The older bounded reference pilot still exists as the smoke/reference lane:

./TRAIN --lane reference_pilot --dry-run
./TRAIN --lane reference_pilot --mode local_reference

The actual pretraining lane can now also retain one bounded multi-host distributed rehearsal segment on the same operator path:

./TRAIN rehearse-base-lane --remote-host archlinux --secondary-remote-host macbook-pro-m2

That proof is still bounded, but it now uses the actual workload rather than the smaller reference-only workload. The retained segment records the larger model id psion-compact-decoder-internal-v1, the canonical dataset identity psion_corpus_tokenized@v1, and one shared tri-host optimizer path across the local Apple-silicon machine, the Tailnet RTX 4080 host, and the admitted M2 Mac. It does not by itself claim the full broader actual-pretraining cluster lane.

The current production-candidate canary on that same path widens the bounded segment to 12 optimizer steps with 3 steps per window and 2 windows per cadence via:

PSION_REFERENCE_PILOT_MAX_STEPS=12 \
PSION_REFERENCE_PILOT_STEPS_PER_WINDOW=3 \
PSION_REFERENCE_PILOT_WINDOWS_PER_CADENCE=2 \
./TRAIN rehearse-base-lane --remote-host archlinux --secondary-remote-host macbook-pro-m2 --cleanup-remote

The current clean source-of-truth run for that canary is psion-actual-pretraining-tri-host-actual-prodcanary-zstd-clean-20260413t134400Z. It completed all 12 optimizer steps on the real actual workload with:

  • topology multi_host_joint_gradient_average
  • contributor count 3
  • contribution receipt count 36
  • retained progress checkpoint count 4
  • retained progress window count 4
  • retained progress cadence count 2
  • final cumulative train tokens processed 775
  • final cumulative mean tokens per second 16
  • accepted checkpoint label bounded-actual-pretraining-bringup-step-12

That retained evidence family now includes compact contribution summary receipts, contributor continuity proof, per-window progress checkpoint receipts, progress checkpoint artifacts, and cumulative throughput/token counters. The distributed exchange path now retains .json.zst step payloads and stages the repo to remotes as .tar.gz, which is what made the clean 12-step canary complete end to end. The claim boundary stays the same: this is still a bounded production-candidate canary, not the full continuous production cluster lane.

Tassadar Training Shortcut

If you want the current default Tassadar training lane instead of guessing among older bounded benchmark lanes, run:

./TRAIN_TASSADAR

That command now means the bounded trace-bound article-transformer weight-production lane that produces the retained tassadar-article-transformer-trace-bound-trained-v0 family under fixtures/tassadar/runs/tassadar_article_transformer_weight_production_v1.

The lane contract lives in docs/TASSADAR_DEFAULT_TRAIN_LANE.md.

The operator launcher lives in docs/TASSADAR_TRAIN_LAUNCHER.md.

The bounded default-lane rehearsal lives in docs/TASSADAR_DEFAULT_TRAIN_REHEARSAL.md.

Tassadar Executor Lane

Executor-class research and runtime work for exact computation starts with docs/ROADMAP_TASSADAR.md.

Local GPT-OSS Inference

Psionic ships a dedicated local GPT-OSS server in crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs. It exposes:

  • GET /health
  • GET /v1/models
  • POST /v1/chat/completions

Build it:

cargo build -p psionic-serve --bin psionic-gpt-oss-server --release

Run it on a Linux NVIDIA host:

./target/release/psionic-gpt-oss-server \
  -m /path/to/gpt-oss-20b-mxfp4.gguf \
  --backend cuda \
  --host 127.0.0.1 \
  --port 8080 \
  -c 4096 \
  -ngl 999

Run it on Apple Silicon:

./target/release/psionic-gpt-oss-server \
  -m /path/to/gpt-oss-20b-mxfp4.gguf \
  --backend metal \
  --metal-mode native \
  --host 127.0.0.1 \
  --port 8080 \
  -c 1024 \
  -ngl 4

Call it:

curl -s http://127.0.0.1:8080/v1/models | jq

curl -s http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "gpt-oss-20b-mxfp4.gguf",
    "messages": [
      {"role": "system", "content": "You are ChatGPT."},
      {"role": "user", "content": "Why does HTTPS matter?"}
    ]
  }' | jq

Benchmark it against local llama.cpp:

scripts/benchmark-gpt-oss-vs-llama.sh \
  --psionic-backend cuda \
  --model /path/to/gpt-oss-20b-mxfp4.gguf \
  --llama-bin /path/to/llama-server \
  --json-out /tmp/psionic-gpt-oss-bench

More detail lives in docs/GPT_OSS_LOCAL_SERVING.md.

Installable Mesh Lanes

Psionic also ships crates/psionic-serve/src/bin/psionic-mesh-lane.rs as the supported service-mode entrypoint for durable inference-mesh nodes.

It materializes one lane root with config, file-backed node identity, durable network state, logs, model paths, and generated launchd / systemd service artifacts. openagents and probe integrate against that Psionic-owned service binary and its management surfaces directly; the supported pooled inference path does not depend on any separate mesh sidecar runtime. The full operator runbook lives in docs/MESH_LANE_SERVICE_MODE.md.

GPT-OSS Benchmark Proof

The current benchmark harness is scripts/benchmark-gpt-oss-vs-llama.sh. It uses the explicit GPT-OSS system/developer/user request contract, checks visible output equality, and records prompt-cache-hit throughput.

The closed benchmark proof referenced publicly here is:

  • OpenAgents issue comment: openagents#3248 comment 4028968842
  • exact reported result on that host:
    • Psionic prompt_cache_hit: 172.84 tok/s
    • llama.cpp prompt_cache_hit: 160.98 tok/s
    • prompt_cache_hit_visible_output_match=true
    • visible output: HTTPS protects users by encrypting traffic, preventing tampering, and confirming they are connected to the right website.

That proof is grounded in the shipped server binary, the shipped benchmark script, and the explicit hardware-validation posture in docs/HARDWARE_VALIDATION_MATRIX.md.

Project Shape

The main crate families are:

  • framework core: psionic-core, psionic-ir, psionic-compiler, psionic-runtime
  • backends: psionic-backend-cpu, psionic-backend-cuda, psionic-backend-metal
  • serving and provider surfaces: psionic-serve, psionic-provider, psionic-router
  • cluster and distributed execution: psionic-cluster, psionic-collectives, psionic-distributed, psionic-net
  • training, eval, and optimizer substrate: psionic-train, psionic-data, psionic-eval, psionic-adapters, psionic-optimize

Use docs/WORKSPACE_MAP.md for the full doc index, crate map, and subsystem entrypoints.

About

Rust ML stack

Resources

License

Stars

Watchers

Forks

Contributors

Languages