-
Notifications
You must be signed in to change notification settings - Fork 11
CrashLoopBackOff on Kubernetes: DuckDB WAL replay fails on fresh database initialization #36
Description
Environment
- Liwan version: 1.4 (image
ghcr.io/explodingcamera/liwan:1.4, SHAsha256:80c696af40b84abb1a008ee307e297a7722c178e654ab88dee5e953e2f93f661) - DuckDB version: 1.5.0 (bundled in liwan 1.4)
- Kubernetes: k3s v1.33.6+k3s1
- Container runtime: containerd 2.1.5-k3s1.33
- Host OS: Debian GNU/Linux 13 (trixie)
- Kernel: 6.12.38+deb13-cloud-amd64
- CPU: AMD EPYC with AVX2 support
- Filesystem: ext4
- Storage: local-path provisioner (k3s default), bind-mount into pod
Problem
Liwan crashes immediately (<1 second) on every startup inside a Kubernetes pod,
even on a completely clean /data directory. The error is:
WARN liwan::app::db: Failed to create DuckDB connection. If you've just upgraded
to Liwan 1.2, please downgrade to version 1.1.1 first, start and stop the server,
and then upgrade to 1.2 again.
Error: Failed to create DuckDB connection: INTERNAL Error: Failure while replaying
WAL file "/data/liwan-events.duckdb.wal": Calling DatabaseManager::GetDefaultDatabase
with no default database set
This error signals an assertion failure within DuckDB.
DuckDB creates the WAL file on fresh init, then immediately fails to replay it as
part of the initial checkpoint. The result is a CrashLoopBackOff with a deterministic
4494-byte WAL file created every time.
What We Tested
| Scenario | Result |
|---|---|
ctr run with overlay filesystem (no volume) |
✅ Works |
ctr run with bind-mount to the same PVC path (clean dir) |
✅ Works |
| Kubernetes pod, clean PVC, default security context | ❌ Crashes |
Kubernetes pod + seccompProfile: Unconfined |
❌ Crashes |
Kubernetes pod + capabilities: add: ["ALL"] |
❌ Crashes |
Kubernetes pod + runAsUser: 0 |
❌ Crashes |
| Kubernetes pod + memory limit 512Mi | ❌ Crashes |
Kubernetes pod + LIWAN_DUCKDB_THREADS=1 |
❌ Crashes |
The crash is deterministic and always produces the same 4494-byte WAL file before failing.
Additional Findings
LIWAN_LISTENcannot be set via environment variable — causesError: duplicate field 'listen', suggesting the distroless image includes a TOML config withlistenalready set.LIWAN_BASE_URLmust point to a resolvable domain before the pod starts, otherwise liwan
panics withfailed to lookup address information: Name does not resolve(src/web/mod.rs:143).- DuckDB 1.5.1 release notes mention a fix for "WAL corruption related to
MarkBlockAsCheckpointed on fresh database initialization", which matches this exact failure.
Liwan 1.4 bundles DuckDB 1.5.0.
Suspected Root Cause
DuckDB 1.5.0 bug in fresh database initialization, fixed in DuckDB 1.5.1. The bug surfaces
specifically in the Kubernetes pod execution context (possibly related to cgroup constraints or
subtle runtime differences vs. bare ctr run). Upgrading the bundled DuckDB to ≥1.5.1 would
likely fix the issue.
Workaround
None found. The application is currently not usable on Kubernetes with liwan 1.4.