Kubernetes operator that manages BotModule, IpcConfig, S3Store, BackupSchedule, and BackupRestore resources for the eevee chatbot ecosystem.
This operator manages five custom resource types:
- BotModule - Deploys and manages all eevee modules (connectors, plugins, router, toolbox)
- IpcConfig - Manages NATS inter-process communication infrastructure
- S3Store - Validates and tracks S3-compatible storage connections for backups
- BackupSchedule - Creates and manages CronJobs for scheduled PVC backups to S3
- BackupRestore - Creates one-shot restore Jobs to recover PVC data from S3 backups
For each BotModule resource, the operator creates and maintains a Kubernetes Deployment, ConfigMap (for module configuration), and optional PersistentVolumeClaim. For each IpcConfig resource, the operator deploys and manages a NATS server with authentication.
The eevee Operator is installed using Helm. The chart is hosted in our custom repository.
- Helm 3.0+
- Kubernetes cluster with access to add Helm repositories
-
Add the eevee Helm repository:
helm repo add eevee https://helm.eevee.bot helm repo update
-
Install the CRDs and operator:
helm install eevee-crds eevee/crds --namespace eevee-bot helm install eevee-operator eevee/operator --namespace eevee-bot
The operator can be configured using Helm values. See the chart documentation at helm.eevee.bot for available options.
npm run buildThis command will:
- Run ESLint to check for code issues
- Compile TypeScript files
npm run devThis command will:
- Build the project
- Run the operator locally
npm run update-librariesUpdates the core eevee libraries to their latest versions.
| Variable | Default | Description |
|---|---|---|
NAMESPACE |
eevee-bot |
The namespace the operator should watch |
WATCH_OTHER_NAMESPACES |
false |
Watch resources in namespaces other than the operator's namespace |
KUBE_IN_CLUSTER_CONFIG |
false |
Use in-cluster Kubernetes configuration |
HTTP_API_PORT |
9000 |
Port for the HTTP API server |
EEVEE_OPERATOR_API_TOKEN |
(none) | Bearer token for authenticated API endpoints |
The operator automatically sets default health probes on module pods:
- Liveness probe: HTTP GET
/healthon the module'smetricsPort, withinitialDelaySeconds: 10andperiodSeconds: 30 - Readiness probe: HTTP GET
/healthon the module'smetricsPort, withinitialDelaySeconds: 5andperiodSeconds: 10 - Startup probe: None (modules typically start quickly)
Modules use setupHttpServer({ natsClients }) from @eeveebot/libeevee to wire the /health endpoint to NATS connectivity checks. When NATS is disconnected, /health returns 503, causing the readiness probe to fail and eventually triggering a liveness restart.
Custom probes can be specified via livenessProbe, readinessProbe, and startupProbe fields in the BotModule spec. These accept standard Kubernetes V1Probe objects and override the defaults entirely.
If a module has metrics: false, no default probes are set.
The operator supports scheduled backups of module PVCs to S3-compatible storage, and on-demand restores.
Resources:
- S3Store — defines an S3 connection (endpoint, bucket, credentials via secrets). The operator validates connectivity and surfaces the result in
status.lastConnectionTest. - BackupSchedule — ties an S3Store to a cron schedule and a botmodule. The operator creates and manages a CronJob that runs the backup image on schedule.
- BackupRestore — triggers a one-shot restore Job to recover a module's PVC from a specific backup (or the latest available).
- BotModule extensions —
spec.backupSchedulereferences a backupschedule CR;spec.bootstrapFromBackuprestores the latest backup before first deployment.
S3 path format: <prefix>/<namespace>/<moduleName>/<uuid>.tar.gz
Backup image: A separate container image (ghcr.io/eeveebot/backup) provides backup.sh and restore.sh scripts using s3cmd.
The operator tracks the module lifecycle via the status subresource. Every reconciliation overwrites status with the current truth — no stale snapshots.
| Reason | When |
|---|---|
Pending |
Reconciliation starting |
WaitingForBackup |
bootstrapFromBackup set, no backups in S3 |
Bootstrapping |
Restore Job running |
BootstrapRestoreFailed |
Restore Job failed or timed out |
Disabled |
spec.enabled: false |
Creating |
Deployment just created |
Updating |
Deployment rolling out |
Ready |
Deployment healthy |
Unavailable |
Deployment degraded |
Bootstrap annotation: The operator tracks whether a PVC has been bootstrapped via the eevee.bot/bootstrapped annotation on the PVC itself (not the CR). This prevents stale restores over live data. The backuprestore CRD overrides the annotation — it always runs and sets the annotation on success.
npm run buildnpm run devContributions are welcome! Please see the eevee contributing guide for details.
CC BY-NC-SA 4.0 — see LICENSE for the full text.