The skill that gives any AI agent eyes, hands, and a keyboard on a real desktop.
Install it. Your agent uses it. Windows, macOS, Linux — any tool-calling model.
Website · Discord · Install · Connect · Tools · Changelog
Clawd Cursor is a skill, not an application. It gives an AI agent the ability to use the user's computer — mouse, keyboard, screen, windows, browser — the same way a human would.
You install it once. Any tool-calling agent on the machine — Claude Code, Cursor, Windsurf, OpenClaw, the Claude Agent SDK, or a bring-your-own-model setup — picks it up through MCP or the skill registry once configured. The agent then knows how to click, type, read the screen, open apps, and drive GUIs whenever the task requires it.
User: "Open Outlook and reply to the latest email from Sarah."
Agent → window({"action":"open_app","name":"Outlook"})
→ accessibility({"action":"read_tree"})
→ accessibility({"action":"invoke","name":"Sarah's email"})
→ computer({"action":"key","combo":"mod+r"})
→ computer({"action":"type","text":"..."})
→ accessibility({"action":"invoke","name":"Send"})
→ done (verified by ground-truth verifier)
No app-specific integrations. No per-service API keys. No cloud round-trip — everything runs locally on 127.0.0.1. If it renders on screen, the agent can read it and act on it.
Design principles. Model-agnostic (Claude, GPT, Gemini, local models via Ollama). OS-agnostic (a single PlatformAdapter handles Windows, macOS, and Linux behind one interface). Skill-first (the AI is the primary consumer; the CLI exists for testing).
v0.8.8 — reliability + correctness release. The mod modifier in compact computer({"action":"key","combo":"mod+s"}) now resolves correctly across platforms (Cmd on macOS, Ctrl on Win/Linux) instead of silently dropping or throwing. Compact accessibility({"action":"set_value", ...}) now actually works — previously the delegate target wasn't registered. smart_click OCR now prefers matches inside the focused window so it can't silently click into a background app. invoke-element.ps1 adds a 2s timeout so React/Electron buttons that advertise InvokePattern but block on invoke can no longer hang the script. Plus a routine round of dependency hygiene (express v5, commander v14, dotenv v17, sharp 0.34) and lint cleanup.
The substantive work landed earlier in the v0.8.x line:
- v0.8.7 — security hardening. Direct tool calls now route through a shared safety gate; accessibility/window/clipboard reads consolidate onto
PlatformAdapter; the version string is single-sourced frompackage.jsonwith a CI guard against drift. Tooling: TypeScript 6.0, ESLint 10, Playwright 1.59. - v0.8.6 — polish release. Fixes a stale
McpServerversion string that had been advertisingv0.7.2in MCP client metadata since the v0.7.x line; addsSECURITY.mdand a private vulnerability reporting channel; trims the homepage; prunes stale repo artifacts. - v0.8.5 —
computer({"action":"key","combo":"..."})now actually works (compact-tool keyboard remap was missing); 16 documentation accuracy fixes; cost-tier ladder added to SKILL.md. - v0.8.4 — security maintenance: patches every fixable CVE in the dependency tree (vite, path-to-regexp, picomatch, hono, follow-redirects); README rewritten to frame clawdcursor as a skill rather than a standalone server.
- v0.8.3 — idempotent
open_app(no more N copies of Outlook stacking up under retry), agent runaway guard,clawdcursor stopsweeps every mode. - v0.8.2 — silent-401 auth bug fixed, force-focus on Windows through the foreground lock, Electron/WebView2 detection + CDP relaunch hint.
- v0.8.1 (rolled into 0.8.2) — unified blind/hybrid/vision pipeline (one loop, three strategy modes), compact MCP surface (6 tools, ~12× smaller catalog), Linux AT-SPI bridge, Wayland input routing.
Full per-release detail in CHANGELOG.md.
powershell -c "irm https://clawdcursor.com/install.ps1 | iex"curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor grant # Accessibility + Screen Recordingcurl -fsSL https://clawdcursor.com/install.sh | bashThe installer clones the skill into ~/clawdcursor, runs npm install, builds, and registers a global clawdcursor shim via npm link. Runtime state (auth token, pidfiles, logs) lives at ~/.clawdcursor/. To wire the skill into an agent host, follow Connect Your Agent below — the installer does not edit any host config files automatically.
Linux notes: install
tesseract-ocrfor OCR,python3-gi+gir1.2-atspi-2.0for accessibility (the AT-SPI typelibpython3-giconsumes), andydotool(orwtype) for Wayland input.
The skill is transport-agnostic. Every agent below exposes the same tool catalog.
Add the MCP entry to ~/.claude/settings.json (the installer leaves agent host config untouched, so this step is required):
openclaw skill install clawdcursorThe skill metadata in SKILL.md tells OpenClaw how to install, bootstrap, and discover the tool catalog. No further configuration needed.
Any MCP-aware editor. Add a stdio MCP entry pointing to clawdcursor mcp --compact. Refer to the host's MCP configuration docs.
The skill also exposes a local REST surface for agents that do not speak MCP. Start the skill server once, then discover tools at GET http://127.0.0.1:3847/tools?mode=compact and call them at POST /execute/:name. Bearer-token auth; token written to ~/.clawdcursor/token. See API below.
The skill exposes two catalogs side by side. Agents pick the one that fits.
Anthropic computer_20250124-style: one tool per capability, with an action enum for the verb. Small prompt footprint (~1,500 tokens), easy for a model to learn zero-shot, the default for most agents.
Most-used actions per compound below. The full enum is at GET /tools?mode=compact or via MCP list_tools.
| Tool | Most-used actions |
|---|---|
computer |
screenshot, click, double_click, right_click, triple_click, hover, scroll, scroll_horizontal, drag, drag_path, type, key, wait |
accessibility |
read_tree, find, get_element, focused, invoke, focus, set_value, get_value, expand, collapse, toggle, select, state, list_children, wait_for |
window |
list, active, focus, maximize, minimize, restore, close, resize, list_displays, screen_size, open_app, open_file, open_url, switch_tab, navigate |
system |
clipboard_read, clipboard_write, system_time, ocr, undo, shortcuts_list, shortcuts_run, delegate, detect_webview, relaunch_with_cdp |
browser |
connect, page_context, read_text, click, type, select_option, evaluate, wait_for, list_tabs, switch_tab, scroll |
task |
(no action enum — takes {instruction: string} and routes through the full pipeline) |
Full catalog for agents that prefer one tool per verb. Sample of categories below; the full list is at GET /tools or list_tools over MCP.
| Category | Examples |
|---|---|
| Perception | read_screen, desktop_screenshot, desktop_screenshot_region, ocr_read_screen, smart_read |
| Mouse | mouse_click, mouse_double_click, mouse_drag, mouse_drag_stepped, mouse_scroll |
| Keyboard | key_press, type_text, smart_type, shortcuts_list, shortcuts_execute |
| Window / App | focus_window, open_app, get_windows, get_active_window, detect_webview_apps |
| Browser (CDP) | cdp_connect, cdp_click, cdp_type, cdp_read_text, cdp_evaluate |
| Accessibility | find_element, invoke_element, wait_for_element, get_focused_element, a11y_expand, a11y_toggle |
| System | read_clipboard, write_clipboard, get_system_time, undo_last, delegate_to_agent |
| Orchestration | smart_click, navigate_browser, wait |
Full catalog visible to the agent through MCP list_tools or at GET /tools.
Every tool call — whether it arrives over MCP, REST, or the built-in agent — passes through the same decision layer.
┌────────────────────────────────────────────┐
agent ─▶ │ Router (regex shortcuts · zero LLM) │ ──▶ tool
└───────────────────┬────────────────────────┘
│ (no shortcut match)
▼
┌────────────────────────────────────────────┐
│ Blind (accessibility tree only) │ ──▶ tool
└───────────────────┬────────────────────────┘
│ (a11y sparse, stagnation)
▼
┌────────────────────────────────────────────┐
│ Hybrid (a11y + screenshot-on-demand) │ ──▶ tool
└───────────────────┬────────────────────────┘
│ (still stuck)
▼
┌────────────────────────────────────────────┐
│ Vision (screenshot every turn) │ ──▶ tool
└────────────────────────────────────────────┘
Every tool call routes through a single safety.evaluate() chokepoint. The agent cannot bypass this path — it is the only way tools execute.
Ground-truth verification. When a task is claimed complete, six independent signals are checked against the post-task screen: pixel diff, window-state change, focus change, OCR delta, task-type assertions (send_email, navigate_url, open_app, type_text, …), and anti-pattern detection (error dialogs, auth failures, "cannot send", "draft saved"). Weighted voting with hard-fail rules. The agent cannot self-report its way past the verifier.
Runaway guard. If the agent calls the same tool with identical arguments three or more times in a six-turn window, the loop exits with a targeted diagnostic — typically pointing at detect_webview when the target app is Electron/WebView2 with a sparse accessibility tree.
Tools are classified into three tiers, enforced at the single safety.evaluate() chokepoint:
| Tier | Actions | Behavior |
|---|---|---|
| Auto | Reading, navigation, opening apps | Executes immediately |
| Preview | Typing, form fill, arbitrary input | Logged before executing |
| Confirm | Sending messages, deleting, purchases | Pauses for user approval |
Hardening: server binds to 127.0.0.1 only, bearer-token auth on every request, dangerous key combinations (Cmd+Q, Alt+F4, Ctrl+Alt+Del) blocked by default, first-run consent prompt required. Sensitive categories (email, banking, password managers) require explicit user approval per action.
For agents that do not speak MCP. Base URL: http://127.0.0.1:3847 (localhost-only, bearer-token auth, token at ~/.clawdcursor/token).
| Endpoint | Method | Purpose |
|---|---|---|
/tools |
GET | Full catalog in OpenAI function-calling format. ?mode=compact for the 6-tool surface. |
/execute/:name |
POST | Execute a tool by name. Returns structured JSON. |
/status |
GET | Current skill state. |
/screenshot |
GET | Current screen as PNG. |
/confirm |
POST | Approve or reject a safety-gated action. |
/abort |
POST | Stop the in-flight task. |
/health |
GET | Version, uptime, and health check. |
Platform-specific code lives in src/v2/platform/{windows,macos,linux}.ts behind a single PlatformAdapter interface. Business logic never reads process.platform.
| Platform | UI Automation | OCR | Browser |
|---|---|---|---|
| Windows x64 / ARM64 | UI Automation via PowerShell bridge | Windows.Media.Ocr |
Chrome / Edge (CDP) |
| macOS Intel / Apple Silicon | JXA + System Events (TCC-safe) | Apple Vision | Chrome / Edge (CDP) |
| Linux X11 | AT-SPI + nut-js | Tesseract | Chrome / Edge (CDP) |
| Linux Wayland | AT-SPI + ydotool / wtype |
Tesseract | Chrome / Edge (CDP) |
- Node.js 20 or newer
- macOS — Xcode CLI tools (
xcode-select --install), thenclawdcursor grantfor Accessibility + Screen Recording - Linux —
tesseract-ocr,python3-gi+gir1.2-atspi-2.0(AT-SPI typelib),ydotoolorwtype(Wayland) - AI provider key — configured on the agent side; the skill itself is model-agnostic
The CLI below is intended for humans diagnosing an install. Agents should not invoke it; they should use MCP or the REST surface.
clawdcursor doctor Diagnose install, permissions, and platform bridges
clawdcursor grant Grant macOS permissions (interactive)
clawdcursor consent Manage desktop-control consent (--accept / --revoke / --status)
clawdcursor status Check readiness (consent, permissions, AI config)
clawdcursor mcp MCP stdio server (the primary skill transport)
clawdcursor serve REST-only tool server (bring-your-own-agent)
clawdcursor stop Stop every running mode (mcp, serve, start)
# The web dashboard is reachable at http://127.0.0.1:3847 while
# `clawdcursor serve` (or `start`) is running — no separate command.
# The two commands below exist for manual end-to-end testing only.
# Real agents should not use these — they should call the skill through MCP.
clawdcursor start Run the built-in autonomous agent (testing)
clawdcursor task <t> Send a task to that agent (testing)
Options:
--port <port> Default: 3847 (start, serve, stop, task)
--compact MCP only: expose 6 compound tools instead of 75 granular.
For REST/serve, use the `?mode=compact` query parameter
on `GET /tools` instead.
--provider <name> `start` only: anthropic | openai | gemini | ollama | ...
--accept `start` and `consent` only: skip the consent prompt.
For `serve`, use `--skip-consent` (dev environments).
TypeScript · Node.js 20+ · nut-js · Playwright · sharp · Express · Model Context Protocol SDK · Zod · commander
MIT — see LICENSE.