agent: Opus 4.7 impl sessions hit exit 143 on Node tooling

# Summary

Opus 4.7 impl sessions running HyperDX Node tooling (`jest`,
`tsc`, `next build`) intermittently die with
`Agent error: Command failed with exit code 143` (SIGTERM).
Four sessions hit this on 2026-05-28 between roughly 21:30 and
21:55 UTC. The shape is: long-running Node child gets SIGTERMed,
the agent surfaces exit 143, and the session stalls mid-task.

## Affected sessions on 2026-05-28

- `impl-af6f01dd` (21:37:45Z last assistant turn was the literal
  exit-143 error)
- `impl-2d3f4375` (21:39:12Z, same string; recovered manually by
  switching to a sibling worktree with surviving `node_modules`)
- `impl-247fdb3b` (likely, by symptom)
- `impl-5f51ff89` (suspected)

## Diagnostics

From inside the container right now:

```text
/sys/fs/cgroup/memory.max     max
/sys/fs/cgroup/memory.current ~2.04 GB
/sys/fs/cgroup/memory.peak    8.31 GB
/sys/fs/cgroup/memory.events  oom_kill 0
```

So the in-container cgroup is not the killer. `memory.peak` of
8.3 GB shows the agent has used well past 2 GB during this
session, with zero `oom_kill` events recorded inside.

The SIGTERM source is therefore one of:

1. An outer cgroup or Docker container limit applied by the
   host, killing the Node child but not the agent itself.
2. The Node process self-terminating after hitting its V8
   `--max-old-space-size` ceiling (Node's default on 64-bit is
   roughly 4 GB but varies; `next build` and big `jest` runs can
   blow past it).
3. An external watchdog sending SIGTERM (less likely).

## Impact

When this fires, the impl session can't make further progress on
its task without manual help. Affects long-running HyperDX work
in particular because `make ci-unit`, `make dev-int`, and
`next build` all spin up large Node processes.

## Proposed routes

Pick one (or both). The first is a one-line workaround, the
second is the real fix.

**Route A: cap Node heap per repo.** Export
`NODE_OPTIONS=--max-old-space-size=1500` for HyperDX runs in the
`validate-after-change` skill, so jest/tsc voluntarily stay
under whatever the outer ceiling is. Cheap, repo-local, no infra
change. Downside: 1500 MB is tight for `next build`; some test
shards may still trip.

**Route B: bump the outer container memory ceiling.** Identify
where the ~2 GB outer ceiling comes from (Docker daemon or host
cgroup) and raise it to 6 to 8 GB. Real fix. Needs whoever
owns the agent host to adjust the container spec.

## Suggested next step

Confirm where the outer limit lives (host `docker inspect
<agent-container>` should show `HostConfig.Memory`). If the limit
is in our control, Route B. If it's host policy, Route A as a
stopgap while we negotiate the host policy.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: Opus 4.7 impl sessions hit exit 143 on Node tooling #86

Summary

Affected sessions on 2026-05-28

Diagnostics

Impact

Proposed routes

Suggested next step

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

agent: Opus 4.7 impl sessions hit exit 143 on Node tooling #86

Description

Summary

Affected sessions on 2026-05-28

Diagnostics

Impact

Proposed routes

Suggested next step

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions