Skip to content

zmlgit/openclaw-task-watchdog

Repository files navigation

openclaw-task-watchdog

npm version License: MIT OpenClaw Plugin

OpenClaw Task Watchdog Plugin — Auto-notify on subagent failures, exec errors, and stale tasks.

中文说明


Why This Plugin?

OpenClaw excels at dispatching subagents and running long tasks via exec. But there's a gap:

Pain Point What Happens
Silent failures A subagent crashes or times out, but the parent session never finds out
Forgotten tasks An exec command exits with error code 137 (OOM) — nobody notices
Stale jobs A background task has been "running" for 45 minutes with no progress
Manual checking Users repeatedly ask "is it done yet?" instead of getting proactive alerts

Task Watchdog bridges this gap by monitoring task lifecycle events and injecting timely notifications into the parent session — so you always know when something needs attention.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    OpenClaw Gateway                      │
│                                                         │
│  ┌──────────────┐    ┌──────────────────────────────┐   │
│  │  Subagent A  │    │       Task Watchdog          │   │
│  │  (running)   │    │                              │   │
│  └──────┬───────┘    │  Hooks:                      │   │
│         │            │  ├─ subagent_ended ──────────►│───┼──► notify parent
│  ┌──────▼───────┐    │  ├─ after_tool_call (exec) ─►│───┼──► notify session
│  │  Subagent B  │    │  ├─ heartbeat_prompt ───────►│───┼──► stale check
│  │  (failed!)   │    │  └─ gateway_start ──────────►│───┼──► timer patrol
│  └──────────────┘    │                              │   │
│                      │  Features:                   │   │
│  ┌──────────────┐    │  • Idempotency guard         │   │
│  │  exec cmd    │    │  • Safe truncation           │   │
│  │  (OOM kill!) │    │  • Circular-ref safe JSON    │   │
│  └──────────────┘    │  • Timer cleanup on stop     │   │
│                      └──────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Features

Hook What it does
subagent_ended Detects abnormal subagent outcomes (error, timeout, killed, reset, deleted) and notifies the parent session. Sends continuation reminders on normal completions.
after_tool_call (exec) Watches for abnormal exec exits — non-zero exit codes, OOM kills, signals, permission denied, command not found.
heartbeat_prompt_contribution When timer patrol is off, injects patrol instructions into heartbeat cycles to check for stale running tasks.
gateway_start Starts a timer-based patrol that periodically requests heartbeats to trigger stale-task checks.
message_received Records user message timestamps for silence detection. Resets consecutive tool call counter.
before_agent_reply Resets consecutive tool call counter and clears silence timer when agent replies.

Design Principles

  • Deadlock-safe: Uses in-process API calls instead of spawning CLI commands
  • Idempotent: Each notification uses an idempotencyKey to prevent duplicates
  • Zero-config: Works out of the box with sensible defaults
  • Memory-safe: Idempotency map capped at 10,000 entries with TTL-based eviction

Silence Detection

The plugin detects two types of agent silence:

  1. Consecutive tool calls without reply: If the agent calls more than consecutiveToolCallThreshold tools in a row without replying to the user, a nudge is injected (once per minute per session).
  2. User message timeout: If a user sends a message but doesn't receive a reply within silenceThresholdMs, a silence nudge is triggered during the next timer patrol cycle.

Installation

# Via OpenClaw plugin install
openclaw plugin install openclaw-task-watchdog

# Via npm
npm install openclaw-task-watchdog

Configuration

All settings are optional. Configure via openclaw.plugin.jsonconfig:

Field Type Default Description
subagentNotifyOn string[] ["error", "timeout", "killed"] Subagent outcomes that trigger notifications. Options: error, timeout, killed, reset, deleted
execNotifyOnAbnormal boolean true Enable notifications on abnormal exec exits
injectionTtlMs integer 300000 (5 min) TTL for next-turn injection messages (5000–600000 ms)
timerPatrol boolean true Enable timer-based patrol on gateway start
heartbeatPatrol boolean false Enable heartbeat-based patrol (only when timerPatrol is disabled)
timerPatrolIntervalMs integer 120000 (2 min) Timer patrol interval (30000–600000 ms)
staleThresholdMs integer 1800000 (30 min) How long before a task is considered stale (60000–7200000 ms)
consecutiveToolCallThreshold integer 5 Number of consecutive tool calls without a reply before triggering a nudge (2–20)
subagentConsecutiveThreshold integer 15 Consecutive tool call threshold for subagent sessions. Defaults to consecutiveToolCallThreshold * 3 if not set
silenceThresholdMs integer 180000 (3 min) How long after a user message without reply before triggering a silence nudge (60000–1800000 ms)

Example:

{
  "task-watchdog": {
    "subagentNotifyOn": ["error", "timeout", "killed", "reset"],
    "timerPatrolIntervalMs": 180000,
    "staleThresholdMs": 900000
  }
}

Development

npm install
npx tsc          # build
npx tsc --watch  # dev mode

Changelog

See CHANGELOG.md for version history.

License

MIT © zml


中文说明

解决什么问题?

OpenClaw 通过子 agent 或 exec 执行长任务时,失败可能被忽略:

痛点 表现
子 agent 崩溃或超时 父 session 不知道,继续等待
exec 命令被 OOM kill 无人发现,任务停滞
后台任务停滞 运行了 45 分钟没有进展
手动检查 用户反复问"做完了吗?"

Task Watchdog 监控任务生命周期,自动注入通知。

安装

openclaw plugin install openclaw-task-watchdog

开发

npm install && npx tsc

About

OpenClaw plugin: auto-notify on subagent failures, exec errors, and stale tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors