Summary
Opus 4.7 impl sessions running HyperDX Node tooling (jest,
tsc, next build) intermittently die with
Agent error: Command failed with exit code 143 (SIGTERM).
Four sessions hit this on 2026-05-28 between roughly 21:30 and
21:55 UTC. The shape is: long-running Node child gets SIGTERMed,
the agent surfaces exit 143, and the session stalls mid-task.
Affected sessions on 2026-05-28
impl-af6f01dd (21:37:45Z last assistant turn was the literal
exit-143 error)
impl-2d3f4375 (21:39:12Z, same string; recovered manually by
switching to a sibling worktree with surviving node_modules)
impl-247fdb3b (likely, by symptom)
impl-5f51ff89 (suspected)
Diagnostics
From inside the container right now:
/sys/fs/cgroup/memory.max max
/sys/fs/cgroup/memory.current ~2.04 GB
/sys/fs/cgroup/memory.peak 8.31 GB
/sys/fs/cgroup/memory.events oom_kill 0
So the in-container cgroup is not the killer. memory.peak of
8.3 GB shows the agent has used well past 2 GB during this
session, with zero oom_kill events recorded inside.
The SIGTERM source is therefore one of:
- An outer cgroup or Docker container limit applied by the
host, killing the Node child but not the agent itself.
- The Node process self-terminating after hitting its V8
--max-old-space-size ceiling (Node's default on 64-bit is
roughly 4 GB but varies; next build and big jest runs can
blow past it).
- An external watchdog sending SIGTERM (less likely).
Impact
When this fires, the impl session can't make further progress on
its task without manual help. Affects long-running HyperDX work
in particular because make ci-unit, make dev-int, and
next build all spin up large Node processes.
Proposed routes
Pick one (or both). The first is a one-line workaround, the
second is the real fix.
Route A: cap Node heap per repo. Export
NODE_OPTIONS=--max-old-space-size=1500 for HyperDX runs in the
validate-after-change skill, so jest/tsc voluntarily stay
under whatever the outer ceiling is. Cheap, repo-local, no infra
change. Downside: 1500 MB is tight for next build; some test
shards may still trip.
Route B: bump the outer container memory ceiling. Identify
where the ~2 GB outer ceiling comes from (Docker daemon or host
cgroup) and raise it to 6 to 8 GB. Real fix. Needs whoever
owns the agent host to adjust the container spec.
Suggested next step
Confirm where the outer limit lives (host docker inspect <agent-container> should show HostConfig.Memory). If the limit
is in our control, Route B. If it's host policy, Route A as a
stopgap while we negotiate the host policy.
Summary
Opus 4.7 impl sessions running HyperDX Node tooling (
jest,tsc,next build) intermittently die withAgent error: Command failed with exit code 143(SIGTERM).Four sessions hit this on 2026-05-28 between roughly 21:30 and
21:55 UTC. The shape is: long-running Node child gets SIGTERMed,
the agent surfaces exit 143, and the session stalls mid-task.
Affected sessions on 2026-05-28
impl-af6f01dd(21:37:45Z last assistant turn was the literalexit-143 error)
impl-2d3f4375(21:39:12Z, same string; recovered manually byswitching to a sibling worktree with surviving
node_modules)impl-247fdb3b(likely, by symptom)impl-5f51ff89(suspected)Diagnostics
From inside the container right now:
So the in-container cgroup is not the killer.
memory.peakof8.3 GB shows the agent has used well past 2 GB during this
session, with zero
oom_killevents recorded inside.The SIGTERM source is therefore one of:
host, killing the Node child but not the agent itself.
--max-old-space-sizeceiling (Node's default on 64-bit isroughly 4 GB but varies;
next buildand bigjestruns canblow past it).
Impact
When this fires, the impl session can't make further progress on
its task without manual help. Affects long-running HyperDX work
in particular because
make ci-unit,make dev-int, andnext buildall spin up large Node processes.Proposed routes
Pick one (or both). The first is a one-line workaround, the
second is the real fix.
Route A: cap Node heap per repo. Export
NODE_OPTIONS=--max-old-space-size=1500for HyperDX runs in thevalidate-after-changeskill, so jest/tsc voluntarily stayunder whatever the outer ceiling is. Cheap, repo-local, no infra
change. Downside: 1500 MB is tight for
next build; some testshards may still trip.
Route B: bump the outer container memory ceiling. Identify
where the ~2 GB outer ceiling comes from (Docker daemon or host
cgroup) and raise it to 6 to 8 GB. Real fix. Needs whoever
owns the agent host to adjust the container spec.
Suggested next step
Confirm where the outer limit lives (host
docker inspect <agent-container>should showHostConfig.Memory). If the limitis in our control, Route B. If it's host policy, Route A as a
stopgap while we negotiate the host policy.