Object Detection Training

End-to-end object detection framework for training, exporting, and annotating with modern architectures and vision-language models.

Train detectors (RFDETR, YOLOX) with PyTorch Lightning and Hydra config management. Export to ONNX for production inference. Annotate datasets at scale using VLM-powered labeling with Google Gemini.

Features

Multi-architecture training — RFDETR (transformer-based) and YOLOX (anchor-free) with a unified Lightning training loop
Hydra configuration — hierarchical YAML configs with full CLI override support for models, datasets, callbacks, and trainers
ONNX export & inference — one-command checkpoint-to-ONNX conversion with built-in inference pipeline and pluggable post-processors
VLM annotation — generate detection annotations from images using Gemini, with structured JSON output, retry logic, and configurable class taxonomies
Experiment tracking — Weights & Biases and TensorBoard integration
Production-ready — Docker builds for GCP Vertex AI, pre-commit hooks, strict MyPy typing, 280+ unit tests

Quick Start

Prerequisites

Pixi for environment and dependency management
CUDA 12.1 (Linux/Windows) or MPS (macOS) for GPU acceleration

Install & Run

git clone https://github.com/ortizeg/object-detection-training.git
cd object-detection-training
./scripts/dev-install.sh

# Train a detector
pixi run train

# Export to ONNX
pixi run task_manager task=onnx_export task.checkpoint_path=model.ckpt

# Run VLM annotation with Gemini
export GOOGLE_API_KEY=<your-key>
pixi run task_manager task=basketball_gemini task.image_dir=/path/to/images

# Override any config from CLI
pixi run train -- training.epochs=50 model.learning_rate=0.001

Code Quality

pixi run format       # Ruff formatter
pixi run lint         # Ruff linter
pixi run typecheck    # MyPy strict mode
pixi run test         # Pytest suite

Architecture

src/object_detection_training/
  models/          RFDETR and YOLOX with Lightning module wrappers
  inference/       ONNX runtime and Gemini VLM inference engines
  callbacks/       EMA, ONNX export, plotting, visualization
  data/            COCO data modules with SQLite-cached datasets
  tasks/           Training, ONNX export, ONNX inference, VLM annotation
  conf/            Hydra YAML configs (models, data, callbacks, tasks)
  schemas/         Pydantic models for detections and annotations

Docker

pixi run build         # Cloud Build on GCP (fast, cached)
pixi run build-local   # Local Docker build

The image targets GCP Vertex AI. Secrets (WANDB_API_KEY, etc.) are injected at runtime via environment variables.

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
.planning		.planning
.vscode		.vscode
docs		docs
eval_output		eval_output
scripts		scripts
src/object_detection_training		src/object_detection_training
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
debug_omdet_turbo.py		debug_omdet_turbo.py
debug_owlv2.py		debug_owlv2.py
mkdocs.yml		mkdocs.yml
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object Detection Training

Features

Quick Start

Prerequisites

Install & Run

Code Quality

Architecture

Docker

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Object Detection Training

Features

Quick Start

Prerequisites

Install & Run

Code Quality

Architecture

Docker

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages