Skip to content

om-gupta-30/Async-AI-Job-Processing-Platform

Repository files navigation

Async AI Job Processing Platform

A production-style platform for submitting long-running AI jobs via a REST API, processing them asynchronously with Celery workers, and watching their progress live in a React dashboard.

Architecture

Browser
  └─► React UI  (Vite · :5173)
        └─► /api/* proxy
              └─► FastAPI  (:8000)
                    ├─► PostgreSQL  (job rows)
                    └─► Redis  (Celery broker)
                          └─► Celery Worker  (AI processing)
                                └─► PostgreSQL  (status updates)

Key design decisions:

  • The API never runs AI work — it enqueues a Celery task and returns 202 Accepted immediately, keeping latency under 100 ms regardless of job complexity.
  • Workers are stateless and horizontally scalable (make scale-workers N=5).
  • Jobs are idempotent — re-delivering a completed/failed task is a no-op.
  • Exponential backoff with jitter retries transient failures up to 3 times.

Tech Stack

Layer Technology
API FastAPI + Uvicorn
Database PostgreSQL 16 + SQLAlchemy 2 (async)
Task queue Celery 5 + Redis 7
Frontend React 18 + Vite 7 + Axios
Monitoring Flower (Celery UI)
Load testing Locust
Orchestration Docker Compose

Project Structure

.
├── app/                        # FastAPI application
│   ├── core/
│   │   ├── config.py           # Pydantic-settings (reads .env)
│   │   ├── database.py         # Async SQLAlchemy engine + get_db()
│   │   └── logging.py          # Structured logging config
│   ├── models/
│   │   └── job.py              # Job ORM model (UUID PK, status enum)
│   ├── schemas/
│   │   └── job_schema.py       # Pydantic request/response schemas
│   ├── routes/
│   │   ├── job_routes.py       # POST /jobs · GET /jobs · GET /jobs/{id}
│   │   └── admin_routes.py     # GET /admin/stats
│   ├── services/
│   │   └── job_service.py      # Business logic + Celery dispatch
│   ├── worker/
│   │   ├── celery_app.py       # Celery instance + broker config
│   │   └── tasks.py            # process_ai_job task (retry + idempotency)
│   └── main.py                 # FastAPI app + lifespan
├── frontend/                   # React SPA
│   ├── src/
│   │   ├── api.js              # Axios instance (baseURL + all API calls)
│   │   ├── App.jsx             # Layout + stats bar + polling loop
│   │   ├── App.css             # All styles (no UI library)
│   │   └── components/
│   │       ├── JobForm.jsx     # Controlled form → POST /jobs
│   │       └── JobList.jsx     # Per-job polling cards with status badges
│   ├── Dockerfile              # Node 20 Alpine dev image
│   └── vite.config.js          # host 0.0.0.0 + /api proxy
├── load_test/
│   └── locustfile.py           # Submit + poll simulation
├── Dockerfile                  # Multi-stage Python image (API + worker)
├── docker-compose.yml          # All services wired together
├── Makefile                    # One-command workflow (see below)
├── requirements.txt            # Pinned Python dependencies
├── .env.example                # Environment variable template (safe to commit)
└── .gitignore

Quick Start

Prerequisites

  • Docker Desktop (includes Compose)
  • make (pre-installed on macOS/Linux; Windows: use Git Bash or WSL)

1. Clone and configure

git clone https://github.com/your-username/async-ai-job-platform.git
cd async-ai-job-platform

cp .env.example .env
# Open .env and set OPENAI_API_KEY if you want real AI inference.
# Everything else works with the defaults for local Docker Compose.

2. Start the platform

make up

That's it. All images are built and every service starts.

Service URL
React UI http://localhost:5173
FastAPI docs http://localhost:8000/docs
Flower (Celery monitor) http://localhost:5555

3. Submit a job

Open http://localhost:5173, type any file path (e.g. reports/q1.pdf), and press Submit Job.
Watch the status badge transition: pending → processing → completed — live, every 3 seconds, no page refresh.

All Make Commands

make up                   # build + start all services  ← start here
make down                 # stop containers (data preserved)
make restart              # restart without rebuild
make rebuild              # full no-cache rebuild

make logs                 # tail all services
make logs-api             # tail FastAPI only
make logs-worker          # tail Celery worker only
make logs-frontend        # tail Vite dev server only

make status               # container names, status, ports
make health               # curl every endpoint, print HTTP codes
make stats                # live queue counts from /admin/stats

make scale-workers N=3    # run N worker replicas in parallel
make load-test            # start Locust UI at http://localhost:8089
make load-test-down       # stop Locust

make clean                # stop + delete volumes  (wipes the DB)
make clean-all            # clean + remove built images

API Reference

Jobs

Method Path Description
POST /jobs Enqueue a new AI job
GET /jobs List all jobs
GET /jobs/{id} Get a single job's status + result

POST /jobs — request body

{ "input_file_path": "reports/q1-financials.pdf" }

Job response

{
  "id": "eebcf00a-e8fc-4950-ab5b-64ec7cc7954c",
  "status": "completed",
  "result": { "output": "Processed successfully" },
  "error_message": null,
  "created_at": "2026-03-03T10:28:35Z",
  "updated_at": "2026-03-03T10:28:40Z"
}

Status values: pendingprocessingcompleted | failed

Admin

Method Path Description
GET /admin/stats Aggregated job counts by status
GET /health Liveness probe

Environment Variables

Copy .env.example to .env and fill in values. Never commit .env.

Variable Default Description
DATABASE_URL postgresql+asyncpg://user:password@db:5432/jobsdb PostgreSQL connection string
REDIS_URL redis://redis:6379/0 General-purpose Redis URL
CELERY_BROKER_URL redis://redis:6379/0 Celery task broker
CELERY_RESULT_BACKEND redis://redis:6379/1 Celery result storage
OPENAI_API_KEY (empty) OpenAI secret key — leave blank for simulated mode
LOG_LEVEL INFO DEBUG / INFO / WARNING / ERROR

Deploying

GitHub Actions / CI

Add each variable from .env.example as a repository secret (Settings → Secrets → Actions). Reference them in your workflow:

env:
  DATABASE_URL: ${{ secrets.DATABASE_URL }}
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Vercel (frontend only)

Settings → Environment Variables → add each VITE_* variable

The React app talks to the API via the /api proxy in vite.config.js. Set VITE_API_HOST to your production API hostname.

GCP Cloud Run

gcloud run deploy api \
  --set-env-vars DATABASE_URL=...,REDIS_URL=... \
  --set-secrets OPENAI_API_KEY=openai-key:latest

Use Secret Manager for OPENAI_API_KEY — never pass it as a plain env var in production.

Horizontal Scaling

# Run 5 workers in parallel — drains the queue ~5x faster
make scale-workers N=5

# Scale back down
make scale-workers N=1

Workers share nothing except the Redis broker and the PostgreSQL database. Adding replicas is linear — 3 workers process 3× as many jobs per minute.

Load Testing

make load-test
# Open http://localhost:8089
# Set users: 100, spawn rate: 10, then Start

Locust simulates users submitting jobs and polling for results. The stats dashboard shows response time percentiles, request rate, and failure count.

Security Notes

  • .env is in .gitignore — it will never be staged or committed.
  • docker-compose.override.yml is also gitignored so local overrides stay local.
  • The PostgreSQL port (5432) and Redis port (6379) are not published to the host — only the API, frontend, Flower, and Locust ports are.
  • Before pushing to a shared environment, place Flower and /admin/stats behind authentication.

License

MIT

About

Scalable async job processing platform for AI/ML workloads

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors