feat: Production-readiness audit + Docker consolidation (72→12 containers) with resilience patterns, observability, and integration tests#40
Conversation
- Python DeepFace liveness engine (passive + active challenges, anti-spoofing) - Python document OCR engine (PaddleOCR, VLM classification, Docling parsing) - Go KYC orchestrator (NIN/BVN/CAC verification, AML screening, risk scoring) - Rust identity matching engine (embedding comparison, fraud detection) - TypeScript tRPC routers + comprehensive KYC/KYB frontend pages - KYC gate integration into Claims flow - API clients for all 4 backend services Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…e ThemeProvider) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
- Revert vite.ts to use inline config spread (configFile: false) instead of configFile path - Revert vite.config.ts to remove define/dedupe/optimizeDeps additions that didn't fix React hooks issue - These reverts restore the original working configuration from previous PRs Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…t plugin double-init) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…oral, PostgreSQL, Keycloak, Permify, Redis, Mojaloop, OpenSearch, OpenAppSec, APISix, TigerBeetle, Lakehouse Go orchestrator (8085): - PostgreSQL persistence replacing in-memory maps - Redis caching for KYC session lookups - Kafka producer for KYC completion events - Temporal client for workflow orchestration - OpenSearch auditor for compliance trail - APISix gateway with OpenAppSec WAF plugin - Mojaloop bridge for mobile money KYC-gated transfers - Keycloak/Permify authorization middleware - All 9 middleware clients wired into main.go Rust ledger service (8113): - TigerBeetle double-entry ledger with KYC-level transfer limits - Dapr sidecar for state management and pub/sub - OpenAppSec WAF validation on all requests - 10 ledger types with KYC level requirements Python services: - Lakehouse analytics (8114) with Delta Lake compliance reporting - Fluvio stream processor (8115) with WebSocket real-time events TypeScript platform integration: - KYC gate checks on claims.create, payments.process, wallet.topUp/withdraw - KYC gate on application.create/submit with level requirements - Onboarding wired to trigger KYC verification on identity step - KYB wired to Go orchestrator for CAC/TIN/director/UBO verification - Middleware integration endpoints (ledger stats, analytics metrics, stream topics, transfer limits, NDPR report) - New service clients: kycLedgerService, kycAnalyticsService, kycStreamService, checkKYCGate helper Co-Authored-By: Patrick Munis <pmunis@gmail.com>
- 6 PyTorch models: fraud detection (residual+attention), churn prediction (GLU), claims adjudication (multi-task), credit scoring (Wide&Deep), anomaly detection (VAE), GNN fraud ring detection (GraphSAGE) - Synthetic Nigerian insurance data generation (275k+ samples across 6 domains) - Real training loops with FocalLoss, OneCycleLR, early stopping, metric tracking - Trained .pt weight files for all 6 models - ONNX export for CPU-optimized inference (4 models) - Delta Lake feature store with versioning (6 tables) - MCMC Bayesian risk modeling with NumPyro/JAX (16 product lines, VaR/CVaR) - Ray distributed training infrastructure with local fallback - Neo4j graph schema for fraud ring detection with offline mode - FastAPI inference server for all models - All models run on CPU (no GPU required) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…sioning, scheduled retraining, platform data ingestion - drift_detector.py: PSI, KS test, JS divergence for data drift + performance monitoring - model_registry.py: Champion-challenger versioning with auto-promotion - data_ingestion.py: Platform data connectors with watermarking and fallback chain - pipeline.py: 5-step orchestration (ingest → drift → retrain → validate → promote → ONNX export) - scheduler.py: Cron-based + event-driven triggers with background thread - api.py: FastAPI endpoints for CT management (/ct/retrain, /ct/drift, /ct/models, /ct/scheduler) - Fixed api_server.py imports for standalone execution - All 4 models retrained, promoted, and exported to ONNX with zero errors Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…g in CT API drift check Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…eaming ingestion, online serving, lineage, RBAC, Feature Store API, Go SDK Components implemented: - Storage: Object store abstraction (Local/S3/MinIO) with unified interface - Schema: Registry with versioning, compatibility checks (backward/forward/full), evolution tracking - Streaming: Kafka/Fluvio ingestion engine with micro-batching, DLQ, checkpointing - Computation: Real-time feature engine with sliding windows, EMA, time-decay scoring - Serving: Online feature server with L1 (LRU) + L2 (Redis) + L3 (Delta Lake) caching - API: FastAPI REST API with DuckDB SQL queries, CRUD, materialization endpoints - Lineage: Full DAG tracking (source→table→model), quality metrics, mutation audit - RBAC: Role-based access control with table/column-level policies, audit logging - Connectors: Python EventBridge + Go SDK for microservice event publishing - All components tested with functional verification (9 features computed, 3 events delivered) Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…o, Python, TypeScript, Rust) Shared SDK libraries for all 12 infrastructure components: - PostgreSQL: connection pooling, migrations, JSONB, audit trail - TigerBeetle: KYC-level transfer limits, 6 ledger codes, batch transfers - Redis: session management, rate limiting, KYC gates, pub/sub, distributed locks - Mojaloop: mobile money interop, KYC-gated transfers, idempotency keys - Kafka: 16 platform topics, idempotent producer, DLQ support, audit events - APISix: rate limiting, OIDC, IP restriction, WAF, health checks - Keycloak: token validation, KYC level attributes, 5-min TTL caching - OpenAppSec: SQL injection, XSS, path traversal blocking - Permify: fine-grained RBAC, schema-based permissions, default-deny - OpenSearch: audit log indexing, ILM policies, structured search - Fluvio: real SDK integration, 11 platform topics, event streaming - Dapr: state management, pub/sub, service invocation Middleware layer (Go/Python/TypeScript): 1. Rate limiting (Redis) 2. Token validation (Keycloak) 3. KYC gate enforcement (Redis + Keycloak) 4. RBAC permission checks (Permify) 5. Async audit logging (OpenSearch + Kafka + Fluvio) All SDKs compile clean: - Go: go vet ./... passes - Python: py_compile all files pass - TypeScript: tsc --noEmit passes - Rust: cargo check passes Co-Authored-By: Patrick Munis <pmunis@gmail.com>
…ervability, gRPC, Docker consolidation (72→12 containers) Production Readiness Gaps Implemented (7 categories): 1. Circuit breakers with exponential backoff+jitter (Go/Python/TS/Rust) 2. Graceful shutdown with signal handling SIGTERM/SIGINT (Go/Python/TS/Rust) 3. Observability — Prometheus metrics export, request latency tracking (Go/Python/TS/Rust) 4. gRPC service registry with circuit breaker per-service (Go SDK) 5. Health/Ready/Live probe handlers for Kubernetes compatibility (Go/Python/TS/Rust) 6. Resilient HTTP clients with circuit breaker + retry (Go/Python/TS/Rust) 7. Request metrics middleware for all stacks Docker Container Consolidation (83% reduction): - 12 containers total (3 infra + 9 app) vs 72 theoretical - docker-compose.yml with health checks, resource limits, shared env - 8 Dockerfiles for consolidated service groups - 5 Go gateway binaries + 2 Python FastAPI gateways - PostgreSQL schema init script with all tables and indexes - All credentials via environment variables (no hardcoded secrets) Integration Tests: - 38 test cases covering health, critical flows, inter-service communication - Parameterized across all 9 service containers Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Co-Authored-By: Patrick Munis <pmunis@gmail.com>
Original prompt from Patrick
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
E2E Test Results — Production Readiness (56/56 PASSED)Tested all 7 consolidated services locally (5 Go binaries + 2 Python FastAPI). Each service was started, probed, and exercised against its full API surface. Category 1: Health/Ready/Live Probes (21/21 PASSED)All 7 services return correct JSON on
Category 2: Metrics Endpoints (7/7 PASSED)All 7 services expose Prometheus-compatible Category 3: Business API Flows (23/23 PASSED)
Category 4: Method Enforcement (3/3 PASSED)
Category 5: Graceful Shutdown (2/2 PASSED)
Observations
Session: https://app.devin.ai/sessions/0475192a778b45cea30202f85ad52b63 |
Summary
Adds production-readiness infrastructure across all four SDK languages (Go, Python, TypeScript, Rust) and consolidates the Docker topology from 72 theoretical service containers down to 12 (3 infrastructure + 9 application), an 83% reduction.
New SDK modules (per language):
circuit_breaker.{go,py,ts,rs})graceful.{go,py,ts,rs})observability.{go,py,ts,rs})grpc_server.go)Docker consolidation:
docker-compose.yml— 12 services: PostgreSQL, Redis, Kafka + 9 application containers grouped by business domaininfrastructure/docker/for consolidated service groupscore-services,insurance-ops,financial,compliance,communication)ml-services,ai-platform)infrastructure/init-db/01-schema.sql) with tables for customers, policies, claims, KYC, payments, compliance, audit, notifications, ML models${ENV_VAR}— no hardcoded secretsIntegration tests:
tests/integration/test_service_health.py— 38 parameterized test cases covering health probes, critical business flows (policy lifecycle, claims, payments, KYC, compliance), and inter-service communication. Tests gracefully skip when services are not running.Also included (from prior work on this branch): AI/ML continuous training pipeline, KYC stream processor, and infrastructure SDK clients for all 12 platform components.
Review & Testing Checklist for Human
infrastructure/docker/cmd/*/main.go) use in-memory maps and return generated data. They do not connect to PostgreSQL despite the schema being provided. Verify this matches your expectations or if real DB wiring is needed before merge..envfile —POSTGRES_PASSWORDhas no default value. Runningdocker-compose upwithout a.envfile will fail. Consider adding a.env.example..pycfile —tests/integration/__pycache__/test_integration.cpython-311-pytest-9.0.2.pycis in the diff. Should be removed and added to.gitignore.go.work*and the Go SDK, but each consolidated service has its owngo.mod. Verify the build context resolves dependencies correctly by runningdocker-compose buildlocally.infrastructure/go-sdk/circuit_breaker.go) and Python (infrastructure/python-sdk/infra_sdk/circuit_breaker.py) implementations for edge cases (e.g., concurrent access, timer races).Recommended test plan:
docker-compose buildto verify all Dockerfiles compile.envwithPOSTGRES_PASSWORD=<value>and rundocker-compose up/healthon each service port (8080, 8085, 8110, 8200, 8400, 8500, 8600, 8700) to verify they startpytest tests/integration/ -vwith services up to validate the integration test suite/api/v1/policieson :8080, GET/api/v1/naicom/solvencyon :8600) to confirm responses look reasonableNotes
/home/ubuntu/production-readiness-audit.md(not committed to the repo).cargo check) for unused variables — these are cosmetic and don't affect compilation.go vet,cargo check,py_compile, andtsc --noEmit.Link to Devin session: https://app.devin.ai/sessions/0475192a778b45cea30202f85ad52b63