A custom AI architecture invented from first principles by unifying four otherwise-disconnected threads of theory: AIXI / Solomonoff (minimum-description-length priors over class complexity,
‖Δk‖_F ≤ C), information geometry (natural gradients on the Gaussian manifold, Cramér–Rao efficient), the Free Energy Principle / active inference (a shared world priorθ₀plus differential class offsetsΔk), and the Information Bottleneck (contrastive encoder feedback via Fisher–Rao residuals). The core invariant is that every classkis represented asθ₀ ⊕ Δk— a shared world prior plus a small class-specific offset in natural parameter space — and classification isy* = argmax_k [log p(h | θ₀ ⊕ Δk) + log p(k | context)]. The architecture is domain-agnostic (any input is reduced to a latent vector by a pluggableEncoder), handles heavy-tailed inputs through a Generalised-Hyperbolic / NIG gate, detects out-of-distribution inputs natively, supports online corrections, and reports calibrated regression uncertainty. The engineering layer wrapping the AI — Python reference, CMake-built C++ native core, REST server, Qt desktop Studio, optional CUDA acceleration, SQLite-backed persistent state — is held to byte-identical Python ↔ C++ parity by188pytest +33CTest cases across thirteen named fixtures (cypha_parity,memory_train_parity,quantile_dif_train_parity,mke_train_step_parity,regression_m4_parity, …). The unusual move is the dual-stack trust model: every research-grade Python feature has a byte-for-byte native equivalent, validated by the parity matrix, so the research code and the runtime production code are the same artefact.
Cypha is a fully custom AI architecture — not a wrapper around a transformer, not a fork of an existing framework, not a deployment of a published paper. The object at the centre is CyphaDIF, the Differential Information Field Classifier, which derives a single learning rule from the intersection of four formal programmes:
- AIXI / Solomonoff contributes a minimum-description-length prior on class complexity. The class-specific offset
Δkis regularised by‖Δk‖_F ≤ Cso that simpler classes are preferred when the evidence is weak, with cold-start protection for the first_MDL_COLD_START = 8observations. - Information geometry contributes the choice of update rule. Updates are natural gradients on the diagonal-Gaussian manifold, which is Cramér–Rao efficient — informally, the cheapest update that does not waste information.
- Active inference / Free Energy Principle contributes the structural decomposition. There is one shared
WorldPriorθ₀(a diagonal Gaussian fitted online by Welford / EMA — Tier-3 "infinite" context that never forgets) and oneClassDifferentialΔkper class, attracted toward observations of that class with MDL decay. - Information Bottleneck contributes the encoder objective. The trainable projection
W_enc : raw_features → latent his updated by contrastive Fisher–Rao residuals — pull the latent toward the correct class's natural-parameter manifold, push it away from competitors, capped at Frobenius norm8.0.
These four threads collapse into one operational rule: every class has the same world prior plus a small offset, classification is an argmax over those offsets, and learning is a natural-gradient step that respects an MDL constraint and a Fisher–Rao encoder loss simultaneously.
The model is domain-agnostic — anything that can be turned into a latent vector by a pluggable Encoder works (numbers, text, spectrograms, behavioural telemetry, …). Out of the box it ships with VectorEncoder (passthrough), RFFEncoder (Random Fourier Features over an RBF kernel, D = 256 features default), and ConcatEncoder (concatenate multiple encoders' outputs). The MultiModalCyphaDIF extends this with per-encoder LLR fusion so each modality contributes its own log-likelihood-ratio and they are summed at the decision step.
Five enhancement phases sit on top of the v1 foundation:
- Phase 1 — Tiered context. A 3-tier
TieredContextBuffer(short / mid / long), anNIGFieldgroup at confidence thresholdτ = 0.99, and field-confidence-weighted blending into the context prior. - Phase 2 — Generation overhaul. Eight named generation modes — temperature-scaled, field-conditioned, latent-boundary interpolation (
α-blended), adversarial (entropy-maximising), OOD sampling, MDL-ball constrained (Fisher–Rao radius), ancestral (k ~ context,h ~ p(h | k)), and Gaussian KDE sampled from the priority replay buffer. - Phase 3 — Active learning & anomaly detection.
anomaly_score(x)(gate value, high = anomalous),active_query_score(x)(entropy ×(1 − max p)— boundary proximity),drift_score()(concept-drift signal from world-prior drift), andinfer_full(x)returning a complete probabilistic breakdown. - Phase 4 — Priority replay. Recency-and-surprise weighted buffer of capacity
10 000, replay rate0.30, with KDE generation from stored latents. - Phase 5 — Sequence & multi-modal.
predict_next(label)for next-label distributions,ConcatEncoderfor feature concatenation,MultiModalCyphaDIFfor per-modality LLR fusion.
The folder is the engineering implementation of this AI architecture. The harmonic-spectrum theoretical backbone (the σ_k ∝ 1/k and α ≈ 0.85 SGD-narrative claims) lives separately at ../Compression Algorithms/NMP_neural_compression_research_paper.md. Cypha is the implementation leg: write the same AI twice (once in Python for research velocity, once in C++ for deployment speed) and prove with a parity matrix that they are the same model.
| File | Role |
|---|---|
README.md |
This file. |
CHANGELOG.md |
Release history — milestones, bug fixes, benchmark deltas. |
Cypha.py |
The AI itself. CyphaDIF, WorldPrior, ClassDifferential, DIFMemory, EncoderProjection, TieredContextBuffer, NIGField, PriorityReplayBuffer, RFFEncoder, MKERegressor, DIFRegressor, RFFRegressor, TwoStageDIFRegressor, MultiLabelDIF, SimilarityIndex, PerformanceMonitor, MultiModalCyphaDIF, ClassifierDistillation. ≈ 7 100 lines. |
test_cypha.py |
Top-level Python test suite for the architecture. |
benchmark.py |
Multi-domain benchmarking harness (17 domains). |
benchmark_baseline.py |
Baseline runner for SOM/self-organising upgrade evaluation. |
bessel_ratios.npz |
Pre-computed K_n Bessel ratios (16 384 uniform points, x ∈ [10⁻⁶, 120], max rel-err < 5 × 10⁻³) — replaces per-call scipy.special.kv in the GH-posterior hot path. |
native/README.md |
Native C++ core build & test guide — CTest harness, parity test inventory, SQLite amalgamation, CUDA smoke test. |
docs/README.md |
Documentation hub — all sub-documents indexed by purpose. |
docs/port/PORT_CONTRACT.md |
The parity contract — what Python and native must agree on, fixture by fixture. |
docs/verify/VERIFICATION_STATUS.md |
Current parity test results across all fixtures. |
docs/reports/DIAGNOSTIC_REPORT.md |
2026-05-30 full diagnostic: three root-cause bugs found, +23.5 pp on linearly-separable 2-class. |
docs/reports/SOM_UPGRADE_REPORT.md |
SOM/GNG/GRIA/Hebbian upgrade evaluation: all six upgrades benchmarked; default flags remain OFF. |
cypha_studio/core/inference.py |
Python InferenceEngine wrapper — batch + single predict, GH gate, OOD detection, online corrections, regression uncertainty. |
cypha_studio/core/trainer.py |
Python Trainer wrapper — TrainerConfig defaults, online + batch training. |
cypha_studio/server/api.py |
REST API server (FastAPI reference implementation). |
cypha_studio/ |
Python Studio: PySide6 desktop GUI + FastAPI REST + training/registry/experiments core. |
native/ |
C++ native core. CMake build. Milestones M1–M6 complete. |
parity_fixtures/ |
Committed parity assets — input vectors and expected outputs the two implementations must agree on. |
install/ |
Platform installers: install_windows.ps1 (Qt shell + cypha_rest) and install_linux.sh (Python stack + native build). |
Note on file paths. The repository's HRNA / NMP research paper is not inside
Cypha/; it lives at../Compression Algorithms/NMP_neural_compression_research_paper.md. Cypha is the engineering implementation; that paper is the theoretical home.
x (raw input — vector, text, spectrum, …)
│
▼
┌──────────────────────────────────┐
│ Encoder (pluggable) │ VectorEncoder │ RFFEncoder │ ConcatEncoder
│ raw → feature vector f │
└──────────────────────────────────┘
│ f
▼
┌──────────────────────────────────┐
│ EncoderProjection W_enc │ Fisher–Rao contrastive update,
│ f → latent h │ ‖W‖_F capped at 8.0
└──────────────────────────────────┘
│ h
▼
┌────────────────────┴────────────────────┐
│ │
▼ ▼
WorldPrior θ₀ ClassDifferential Δk (per-class)
diagonal Gaussian natural-parameter offset
Welford/EMA, Tier-3 attracted toward h, MDL-decayed
"infinite context" ‖Δk‖_F ≤ C (Solomonoff prior)
│ │
└────────────────────┬─────────────────────┘
▼
┌──────────────────────────────────┐
│ DIFMemory │ LLR matrix per class:
│ score k = log p(h|θ₀ ⊕ Δk) │ log p(h | N(μ_k, diag v_k))
│ + GH gate (heavy-tailed) │ Generalised-Hyperbolic /
└──────────────────────────────────┘ NIG posterior, Bessel-ratio LUT
│ LLR_k
▼
┌──────────────────────────────────┐
│ TieredContextBuffer │ short (window 32) / mid (EMA 0.98) / long
│ + NIGField (τ = 0.99) │ field-confidence-weighted prior
│ → log p(k | context) │
└──────────────────────────────────┘
│
▼
y* = argmax_k [ LLR_k + log p(k|context) ]
+ confidence, anomaly_score, r_eff, OOD flag
Training is the same machinery in reverse:
- The
WorldPriorupdates by EMA toward the new observation's latenth. - The matched class's
ClassDifferentialis attracted towardh(with MDL decay subtracted). - The
EncoderProjectionis updated by the Fisher–Rao contrastive gradient — pullhtoward(μ_k, v_k), push it away from the runner-up(μ_j, v_j). - The replay buffer stores
(x, h, label)weighted by surprise (high LLR-residual = high priority); a fractionreplay_ratio = 0.30of subsequent steps come from the buffer. - Every
_ALIGN_EVERY = 500steps, the encoder is realigned to the dominantΔkdirections, ensuring the latent space remains expressive.
A separate CausalField runs alongside as a recurrent SGEMV update for sequential context, and the regressor variants (DIFRegressor, RFFRegressor, TwoStageDIFRegressor, MKERegressor) replace the LLR-argmax with a ridge-regression / RLS posterior so the same architecture handles regression with calibrated uncertainty (predict_with_uncertainty(X) returns posterior std).
These defaults come from a profiled medium-grid tuning programme (scripts/tune_quality_performance.py), not from guesses.
| Parameter | Default | Origin |
|---|---|---|
feat_dim |
128 |
profiled on OpenML 1464 + tuning grid |
field_dim |
128 |
matched to feat_dim for the no-injection fast path |
rff_D (RFFEncoder) |
256 |
RFF kernel approximation budget |
n_experts (MKE) |
8 |
mixture-of-experts head |
temperature |
1.15 |
classification optimum (regressor overrides to 1.05) |
context_win |
32 |
profiled medium grid |
replay_ratio |
0.30 |
priority replay rate |
replay_capacity |
10 000 |
Phase-4 buffer (5× v1) |
| LR — world | 0.008 |
classification-optimal |
| LR — delta | 0.05 |
profiled medium grid |
| LR — encoder | 0.002 |
Fisher–Rao stability |
mdl_lambda |
0.001 |
Solomonoff prior strength |
mdl_cold_start |
8 |
observations before MDL kicks in |
OOD_THRESHOLD |
3.0 |
anomaly_score gate |
OOD_SIGMA |
15.0 |
OOD distribution width |
align_every |
500 |
encoder-realignment cadence |
| Parameter | Default |
|---|---|
OOD_THRESHOLD |
3.0 |
| Batch / single prediction | both supported |
| GH gate | yes (heavy-tailed input handling) |
| Online corrections | yes |
| Regression uncertainty | yes (posterior std) |
| Test | What it verifies |
|---|---|
cypha_parity |
Top-level Python ↔ native parity |
memory_train_parity |
DIFMemory training step |
quantile_dif_train_parity |
Quantile-DIF training step |
mke_train_step_parity |
Mixture-of-experts training step |
regression_m4_parity |
M4 regression |
cuda_smoke |
CUDA path smoke test |
(Full inventory in native/README.md. Total: 188 pytest + 33 CTest cases.)
Full diagnostic run documented in docs/reports/DIAGNOSTIC_REPORT.md. Three root-cause bugs were found and fixed; results below are post-fix:
| Task | CyphaDIF | SGD (online) | SVM ceiling | Notes |
|---|---|---|---|---|
| S1 — linearly-separable 2-class | 0.783 | 0.644 | 0.898 | RFF + 4 passes + deliberation disabled |
| S3 — XOR (nonlinear) | 0.482 | 0.498 | 0.825 | Hard LLR-linearity limit — kernel LLR required |
| R1 — Iris | 0.900 | 0.821 | 0.968 | Auto-RFF for dim≤30 |
| R2 — Wine | 0.969 | 0.964 | 0.987 | Near-saturated |
| R3 — Digits (10-class) | 0.922 | 0.900 | 0.982 | delta_lr=0.03 fix |
| R4 — Breast cancer | 0.957 | 0.950 | 0.983 |
Key findings:
- Catastrophic forgetting ratio: 0.000 (perfect retention; sufficient-statistics design).
- Label-noise robustness at 30% noise: 79.1% accuracy (well above chance for 5-class).
- Convergence to 100% on well-separated 5-class Gaussian clusters: step 50 (matches SGD online).
- XOR / nonlinear boundaries: hard ceiling at 48.2%. Gap is 32.3 pp vs kernel SVM — requires Kernel LLR (Nyström). Confirmed highest-priority architectural upgrade.
- D04 runs CyphaLM (Izaac → CellAI SSM → CyphaDIF → GRIA): held-out BPC, context-length curve, expert routing, save/restore parity, and sampling benchmarks. Install with
pip install -e cypha_lm/. D17 adds WikiText integration and online adaptation. - CyphaLM REST: FastAPI
POST /generateand/generate/stream(SSE) with per-token CyphaDIF routing — seecypha_studio/README.mdandexamples/.
- The AI is bespoke. CyphaDIF is not a fork, not a wrapper, not a tuning of an existing model. It is a from-first-principles architecture whose learning rule is derived from the intersection of four formal programmes (AIXI / Solomonoff, information geometry, FEP, IB).
- The proof surface is parity correctness, not leaderboard ML accuracy. No "we beat X on benchmark Y" claim. Instead: "the Python and native paths produce byte-identical results across this fixture matrix." Benchmark numbers (§ above) are honest measurements on standard sklearn datasets, not cherry-picked.
- Known architectural limit: nonlinear decision boundaries. The LLR discriminant is linear in latent space. XOR-style tasks have a hard ceiling of ~48% without a Kernel LLR extension (Nyström approximation). This is confirmed by diagnostic evidence (FDR=0.001 on XOR; kernel SVM reaches 83.5%). See
docs/FUTURE.mdfor the upgrade plan. - Theoretical backbone lives elsewhere. The harmonic-spectrum /
σ_k ∝ 1/k/α ≈ 0.85claims belong to../Compression Algorithms/NMP_neural_compression_research_paper.md, not to Cypha itself. Cypha is the implementation leg. - Optional CUDA. The native core works without GPU; CUDA is a build flag. CUDA path is exercised by
cuda_smokeonly, not by the full parity matrix. - Future waves. Kernel LLR (Nyström), Qt UX polish, packaged binaries, multi-model REST, ONNX export — see
docs/FUTURE.mdanddocs/RESEARCH_STATUS.mdfor the full hypothesis ledger and priority ranking.
| Standard | Limitation | What Cypha offers |
|---|---|---|
| Python-only research repo | Slow at deploy time | Native C++ core with parity-validated Python ↔ C++ outputs |
| C++-only production runtime | Hard to iterate on | Python reference is canonical; native must match it |
| Off-the-shelf classifier (sklearn, XGBoost, …) | Black-box training rule | First-principles architecture; every constant is derivable |
| Transformer + softmax classifier | Calibration is an afterthought | GH gate gives natively-calibrated heavy-tail handling and OOD flag |
| Notebook + Flask script | No persistence | SQLite-backed state (amalgamated 3.47.2) |
| Custom REST + Python | No GUI | Qt Studio integration |
| Standalone classifier | No regression path | DIFRegressor / TwoStageDIFRegressor reuse the same machinery |