NeuralDBG

A causal inference engine for deep learning training that provides structured explanations of neural network training failures. Understand why your model failed during training through semantic analysis and abductive reasoning, not raw tensor inspection.

Overview

NeuralDBG treats training as a semantic trace of learning dynamics rather than a black box. It extracts meaningful events and provides causal hypotheses about training failures, enabling researchers to:

Identify gradient health transitions (stable -> vanishing/saturated)
Detect activation regime shifts (normal -> saturated/dead)
Detect optimizer instability (loss plateaus, spikes, divergence)
Catch data anomalies (NaN, Inf, distribution shifts)
Track propagation of instabilities through network layers
Generate ranked causal explanations for training failures

Unlike traditional monitoring tools (TensorBoard, Weights & Biases), NeuralDBG focuses on causal inference rather than metric tracking.

Key Features

Semantic Event Extraction: Detects meaningful transitions in training dynamics
Causal Compression: Identifies first occurrences and propagation patterns
Post-Mortem Reasoning: Provides ranked hypotheses about failure causes
Optimizer Instability Detection: Tracks loss plateaus, spikes, and divergence
Data Anomaly Detection: Catches NaN, Inf, and distribution shifts in inputs
Event Collapsing: Merges sequential events into summary traces
Compiler-Aware: Operates at module boundaries to survive torch.compile
Non-Invasive: Wraps existing PyTorch training loops without code changes
Minimal API: Focused on explanations, not raw data dumps

Quick Start

Installation

pip install neuraldbg

Docker Development (Hermetic Workspace)

Use Docker to keep a reproducible local environment across machines and contributors.

# Build image
docker-compose build

# Start the dev container (one-command startup)
docker-compose up -d

# Open a shell in the running workspace
docker-compose exec neuraldbg-dev bash

Equivalent shortcuts via Makefile:

make build
make up
make shell

Run tests inside Docker:

docker-compose run --rm neuraldbg-dev bash -lc "pytest"

Or:

make test-docker

Persistent volumes are mounted to:

/data (host: ./data)
/models (host: ./models)
/outputs (host: ./outputs)

Stop containers:

docker-compose down

Basic Usage

import torch
import torch.nn as nn
from neuraldbg import NeuralDbg

# Your existing model and training setup
model = nn.Sequential(nn.Linear(10, 5), nn.ReLU(), nn.Linear(5, 1))
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Wrap your training loop
with NeuralDbg(model) as dbg:
    for step, (inputs, targets) in enumerate(dataloader):
        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        # Events are extracted automatically

# After training failure, query for explanations
explanations = dbg.explain_failure()
print(explanations[0])  # "Gradient vanishing originated in layer 'linear1' at step 234, likely due to LR × activation mismatch (confidence: 0.87)"

Inference API

# Get ranked causal hypotheses for the failure
hypotheses = dbg.get_causal_hypotheses()

# Query specific causal chains
chain = dbg.trace_causal_chain('vanishing_gradients')

# Check for coupled failures
couplings = dbg.detect_coupled_failures()

Optimizer Instability Detection

with NeuralDbg(model) as dbg:
    for step in range(num_steps):
        dbg.step = step
        output = model(inputs)
        loss = criterion(output, targets)
        loss.backward()

        # Feed loss values for optimizer instability detection
        dbg.record_loss(loss.item())

        optimizer.step()

# Detect loss plateaus, spikes, or divergence
hypotheses = dbg.explain_failure("optimizer_instability")
for h in hypotheses:
    print(h.description)  # "Loss spike detected at step 50..."

Data Anomaly Detection

Data anomalies (NaN, Inf, distribution shifts) are detected automatically from layer inputs during the forward pass -- no extra API call needed:

with NeuralDbg(model) as dbg:
    # ... training loop ...
    pass

# Check for data issues
hypotheses = dbg.explain_failure("data_anomaly")
for h in hypotheses:
    print(h.description)  # "NaN values detected in input to layer 'linear1'..."

Event Collapsing

Compress sequential events in the same layer into summary traces:

# Get compressed event timeline
collapsed = dbg._collapse_events()
print(f"{len(dbg.events)} raw events -> {len(collapsed)} collapsed")

Architecture

Core Components

Semantic Event Extractor: Detects meaningful transitions in learning dynamics
Causal Compressor: Identifies patterns and propagation in training failures
Post-Mortem Reasoner: Generates ranked hypotheses about failure causes
Compiler-Aware Monitor: Operates at safe boundaries for optimization compatibility

Event Types

Event Type	Source	Detects
`gradient_health_transition`	Backward hooks	Vanishing, exploding, saturated gradients
`activation_regime_shift`	Forward hooks	Dead neurons, saturated activations
`optimizer_instability`	`record_loss()`	Loss plateaus, spikes, divergence
`data_anomaly`	Forward hooks (inputs)	NaN, Inf, distribution shifts

Event Structure

Each semantic event represents:

Transition type (gradient_health, activation_regime, optimizer_instability, data_anomaly)
Layer/parameter identifier
Step range of occurrence
Confidence score
Causal metadata (propagation patterns, coupled failures)

Target Users

ML Researchers seeking causal explanations for training failures
PhD Students analyzing learning dynamics in novel architectures
Research Engineers understanding optimization instabilities

Not intended for production monitoring, metric tracking, or no-code users.

Supported Failure Types

vanishing_gradients -- Root cause + saturation coupling
exploding_gradients -- First layer to explode
dead_neurons -- Neuron death in activation layers
saturated_activations -- Activation saturation patterns
optimizer_instability -- Loss plateaus, spikes, divergence (with gradient cross-reference)
data_anomaly -- NaN/Inf/distribution shift in inputs

Limitations (MVP Scope)

PyTorch only
Focus on semantic events, not tensor inspection
Command-line interface only
Compiler-aware (torch.compile compatible)

Contributing

This is an MVP focused on proving the concept of causal inference for training dynamics. Contributions should align with the core mission of providing structured explanations for training failures.

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE.md for details.

Documentation

PROJECTS.md - Roadmap Projets A & B (Projet A dans Quant-Search, B ici)
CHANGELOG.md - Version history and notable changes
logic_graph.md - System architecture and data flow
GOOGLE_DOCS_SYNC.md - Daily SESSION_SUMMARY sync to Google Docs

Google Docs Daily Sync

You can automate daily publication of SESSION_SUMMARY.md to a Google Doc:

Install optional automation deps:

pip install -e .[automation]

Configure:
- GOOGLE_DOC_ID
- GOOGLE_SERVICE_ACCOUNT_FILE or GOOGLE_SERVICE_ACCOUNT_JSON
Run:

python scripts/publish_session_summary_to_gdocs.py --source SESSION_SUMMARY.md --mode append

For GitHub Actions-based daily sync, see .github/workflows/publish-summary-to-google-docs.yml.

Citation

If you use NeuralDBG in your research, please cite:

@misc{neuraldbg2025,
  title={NeuralDBG: A Causal Inference Engine for Deep Learning Training Dynamics},
  author={SENOUVO Jacques-Charles Gad},
  year={2025},
  url={https://github.com/Lemniscate-world/Neural}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,425 Commits
.antigravity		.antigravity
.cursor		.cursor
.githooks		.githooks
.github		.github
.vscode		.vscode
artifacts		artifacts
data		data
ia_rules		ia_rules
infrastructure_planning		infrastructure_planning
models		models
neuraldbg.egg-info		neuraldbg.egg-info
outputs		outputs
plans		plans
prompts		prompts
research		research
scripts		scripts
tests		tests
.coverage		.coverage
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
AI_GUIDELINES.md		AI_GUIDELINES.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
GAD.md		GAD.md
GOOGLE_DOCS_SYNC.md		GOOGLE_DOCS_SYNC.md
INFERENCE_FLOW.md		INFERENCE_FLOW.md
LICENSE.md		LICENSE.md
Makefile		Makefile
PROJECTS.md		PROJECTS.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SESSION_SUMMARY.md		SESSION_SUMMARY.md
SESSION_SUMMARY.md.bak		SESSION_SUMMARY.md.bak
SYNC_LOG.md		SYNC_LOG.md
bandit-project-report.json		bandit-project-report.json
bandit-report.json		bandit-report.json
bandit_report.txt		bandit_report.txt
copilot-instructions.md		copilot-instructions.md
decision-memo.md		decision-memo.md
demo_vanishing_gradients.py		demo_vanishing_gradients.py
docker-compose.yml		docker-compose.yml
interview_collection_guide.md		interview_collection_guide.md
logic_graph.md		logic_graph.md
marketing_test_plan.md		marketing_test_plan.md
mom_test_template.md		mom_test_template.md
neuraldbg.py		neuraldbg.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-prod.txt		requirements-prod.txt
requirements.txt		requirements.txt
sonar-project.properties		sonar-project.properties
sync_agents.py		sync_agents.py
sync_summary.py		sync_summary.py
synthetic_data_sample.png		synthetic_data_sample.png
test_compile_hooks_disabled.py		test_compile_hooks_disabled.py
test_compile_hooks_simple.py		test_compile_hooks_simple.py
visualize_causality.py		visualize_causality.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralDBG

Overview

Key Features

Quick Start

Installation

Docker Development (Hermetic Workspace)

Basic Usage

Inference API

Optimizer Instability Detection

Data Anomaly Detection

Event Collapsing

Architecture

Core Components

Event Types

Event Structure

Target Users

Supported Failure Types

Limitations (MVP Scope)

Contributing

License

Documentation

Google Docs Daily Sync

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuralDBG

Overview

Key Features

Quick Start

Installation

Docker Development (Hermetic Workspace)

Basic Usage

Inference API

Optimizer Instability Detection

Data Anomaly Detection

Event Collapsing

Architecture

Core Components

Event Types

Event Structure

Target Users

Supported Failure Types

Limitations (MVP Scope)

Contributing

License

Documentation

Google Docs Daily Sync

Citation

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages