Clawd Cursor

The skill that gives any AI agent eyes, hands, and a keyboard on a real desktop.
Install it. Your agent uses it. Windows, macOS, Linux — any tool-calling model.

Website · Discord · Install · Connect · Tools · Changelog

What This Is

Clawd Cursor is a skill, not an application. It gives an AI agent the ability to use the user's computer — mouse, keyboard, screen, windows, browser — the same way a human would.

You install it once. Any tool-calling agent on the machine — Claude Code, Cursor, Windsurf, OpenClaw, the Claude Agent SDK, or a bring-your-own-model setup — picks it up through MCP or the skill registry once configured. The agent then knows how to click, type, read the screen, open apps, and drive GUIs whenever the task requires it.

User: "Open Outlook and reply to the latest email from Sarah."

Agent  →  window({"action":"open_app","name":"Outlook"})
       →  accessibility({"action":"read_tree"})
       →  accessibility({"action":"invoke","name":"Sarah's email"})
       →  computer({"action":"key","combo":"mod+r"})
       →  computer({"action":"type","text":"..."})
       →  accessibility({"action":"invoke","name":"Send"})
       → done (verified by ground-truth verifier)

No app-specific integrations. No per-service API keys. No cloud round-trip — everything runs locally on 127.0.0.1. If it renders on screen, the agent can read it and act on it.

Design principles. Model-agnostic (Claude, GPT, Gemini, local models via Ollama). OS-agnostic (a single PlatformAdapter handles Windows, macOS, and Linux behind one interface). Skill-first (the AI is the primary consumer; the CLI exists for testing).

Latest Release

v0.8.8 — reliability + correctness release. The mod modifier in compact computer({"action":"key","combo":"mod+s"}) now resolves correctly across platforms (Cmd on macOS, Ctrl on Win/Linux) instead of silently dropping or throwing. Compact accessibility({"action":"set_value", ...}) now actually works — previously the delegate target wasn't registered. smart_click OCR now prefers matches inside the focused window so it can't silently click into a background app. invoke-element.ps1 adds a 2s timeout so React/Electron buttons that advertise InvokePattern but block on invoke can no longer hang the script. Plus a routine round of dependency hygiene (express v5, commander v14, dotenv v17, sharp 0.34) and lint cleanup.

The substantive work landed earlier in the v0.8.x line:

v0.8.7 — security hardening. Direct tool calls now route through a shared safety gate; accessibility/window/clipboard reads consolidate onto PlatformAdapter; the version string is single-sourced from package.json with a CI guard against drift. Tooling: TypeScript 6.0, ESLint 10, Playwright 1.59.
v0.8.6 — polish release. Fixes a stale McpServer version string that had been advertising v0.7.2 in MCP client metadata since the v0.7.x line; adds SECURITY.md and a private vulnerability reporting channel; trims the homepage; prunes stale repo artifacts.
v0.8.5 — computer({"action":"key","combo":"..."}) now actually works (compact-tool keyboard remap was missing); 16 documentation accuracy fixes; cost-tier ladder added to SKILL.md.
v0.8.4 — security maintenance: patches every fixable CVE in the dependency tree (vite, path-to-regexp, picomatch, hono, follow-redirects); README rewritten to frame clawdcursor as a skill rather than a standalone server.
v0.8.3 — idempotent open_app (no more N copies of Outlook stacking up under retry), agent runaway guard, clawdcursor stop sweeps every mode.
v0.8.2 — silent-401 auth bug fixed, force-focus on Windows through the foreground lock, Electron/WebView2 detection + CDP relaunch hint.
v0.8.1 (rolled into 0.8.2) — unified blind/hybrid/vision pipeline (one loop, three strategy modes), compact MCP surface (6 tools, ~12× smaller catalog), Linux AT-SPI bridge, Wayland input routing.

Full per-release detail in CHANGELOG.md.

Install the Skill

Windows

powershell -c "irm https://clawdcursor.com/install.ps1 | iex"

macOS

curl -fsSL https://clawdcursor.com/install.sh | bash
clawdcursor grant     # Accessibility + Screen Recording

Linux

curl -fsSL https://clawdcursor.com/install.sh | bash

The installer clones the skill into ~/clawdcursor, runs npm install, builds, and registers a global clawdcursor shim via npm link. Runtime state (auth token, pidfiles, logs) lives at ~/.clawdcursor/. To wire the skill into an agent host, follow Connect Your Agent below — the installer does not edit any host config files automatically.

Linux notes: install tesseract-ocr for OCR, python3-gi + gir1.2-atspi-2.0 for accessibility (the AT-SPI typelib python3-gi consumes), and ydotool (or wtype) for Wayland input.

Connect Your Agent

The skill is transport-agnostic. Every agent below exposes the same tool catalog.

Claude Code

Add the MCP entry to ~/.claude/settings.json (the installer leaves agent host config untouched, so this step is required):

// ~/.claude/settings.json
{
  "mcpServers": {
    "clawdcursor": {
      "command": "clawdcursor",
      "args": ["mcp", "--compact"]
    }
  }
}

OpenClaw

openclaw skill install clawdcursor

The skill metadata in SKILL.md tells OpenClaw how to install, bootstrap, and discover the tool catalog. No further configuration needed.

Cursor, Windsurf, Zed

Any MCP-aware editor. Add a stdio MCP entry pointing to clawdcursor mcp --compact. Refer to the host's MCP configuration docs.

Claude Agent SDK / bring-your-own-model

The skill also exposes a local REST surface for agents that do not speak MCP. Start the skill server once, then discover tools at GET http://127.0.0.1:3847/tools?mode=compact and call them at POST /execute/:name. Bearer-token auth; token written to ~/.clawdcursor/token. See API below.

Tool Surface

The skill exposes two catalogs side by side. Agents pick the one that fits.

Compact — 6 compound tools (recommended)

Anthropic computer_20250124-style: one tool per capability, with an action enum for the verb. Small prompt footprint (~1,500 tokens), easy for a model to learn zero-shot, the default for most agents.

Most-used actions per compound below. The full enum is at GET /tools?mode=compact or via MCP list_tools.

Tool	Most-used actions
`computer`	`screenshot`, `click`, `double_click`, `right_click`, `triple_click`, `hover`, `scroll`, `scroll_horizontal`, `drag`, `drag_path`, `type`, `key`, `wait`
`accessibility`	`read_tree`, `find`, `get_element`, `focused`, `invoke`, `focus`, `set_value`, `get_value`, `expand`, `collapse`, `toggle`, `select`, `state`, `list_children`, `wait_for`
`window`	`list`, `active`, `focus`, `maximize`, `minimize`, `restore`, `close`, `resize`, `list_displays`, `screen_size`, `open_app`, `open_file`, `open_url`, `switch_tab`, `navigate`
`system`	`clipboard_read`, `clipboard_write`, `system_time`, `ocr`, `undo`, `shortcuts_list`, `shortcuts_run`, `delegate`, `detect_webview`, `relaunch_with_cdp`
`browser`	`connect`, `page_context`, `read_text`, `click`, `type`, `select_option`, `evaluate`, `wait_for`, `list_tabs`, `switch_tab`, `scroll`
`task`	(no `action` enum — takes `{instruction: string}` and routes through the full pipeline)

Granular — 75 individual tools

Full catalog for agents that prefer one tool per verb. Sample of categories below; the full list is at GET /tools or list_tools over MCP.

Category	Examples
Perception	`read_screen`, `desktop_screenshot`, `desktop_screenshot_region`, `ocr_read_screen`, `smart_read`
Mouse	`mouse_click`, `mouse_double_click`, `mouse_drag`, `mouse_drag_stepped`, `mouse_scroll`
Keyboard	`key_press`, `type_text`, `smart_type`, `shortcuts_list`, `shortcuts_execute`
Window / App	`focus_window`, `open_app`, `get_windows`, `get_active_window`, `detect_webview_apps`
Browser (CDP)	`cdp_connect`, `cdp_click`, `cdp_type`, `cdp_read_text`, `cdp_evaluate`
Accessibility	`find_element`, `invoke_element`, `wait_for_element`, `get_focused_element`, `a11y_expand`, `a11y_toggle`
System	`read_clipboard`, `write_clipboard`, `get_system_time`, `undo_last`, `delegate_to_agent`
Orchestration	`smart_click`, `navigate_browser`, `wait`

Full catalog visible to the agent through MCP list_tools or at GET /tools.

How the Skill Thinks

Every tool call — whether it arrives over MCP, REST, or the built-in agent — passes through the same decision layer.

         ┌────────────────────────────────────────────┐
agent ─▶ │  Router   (regex shortcuts · zero LLM)    │ ──▶ tool
         └───────────────────┬────────────────────────┘
                             │  (no shortcut match)
                             ▼
         ┌────────────────────────────────────────────┐
         │  Blind     (accessibility tree only)       │ ──▶ tool
         └───────────────────┬────────────────────────┘
                             │  (a11y sparse, stagnation)
                             ▼
         ┌────────────────────────────────────────────┐
         │  Hybrid    (a11y + screenshot-on-demand)   │ ──▶ tool
         └───────────────────┬────────────────────────┘
                             │  (still stuck)
                             ▼
         ┌────────────────────────────────────────────┐
         │  Vision    (screenshot every turn)         │ ──▶ tool
         └────────────────────────────────────────────┘

Every tool call routes through a single safety.evaluate() chokepoint. The agent cannot bypass this path — it is the only way tools execute.

Ground-truth verification. When a task is claimed complete, six independent signals are checked against the post-task screen: pixel diff, window-state change, focus change, OCR delta, task-type assertions (send_email, navigate_url, open_app, type_text, …), and anti-pattern detection (error dialogs, auth failures, "cannot send", "draft saved"). Weighted voting with hard-fail rules. The agent cannot self-report its way past the verifier.

Runaway guard. If the agent calls the same tool with identical arguments three or more times in a six-turn window, the loop exits with a targeted diagnostic — typically pointing at detect_webview when the target app is Electron/WebView2 with a sparse accessibility tree.

Safety

Tools are classified into three tiers, enforced at the single safety.evaluate() chokepoint:

Tier	Actions	Behavior
Auto	Reading, navigation, opening apps	Executes immediately
Preview	Typing, form fill, arbitrary input	Logged before executing
Confirm	Sending messages, deleting, purchases	Pauses for user approval

Hardening: server binds to 127.0.0.1 only, bearer-token auth on every request, dangerous key combinations (Cmd+Q, Alt+F4, Ctrl+Alt+Del) blocked by default, first-run consent prompt required. Sensitive categories (email, banking, password managers) require explicit user approval per action.

API

For agents that do not speak MCP. Base URL: http://127.0.0.1:3847 (localhost-only, bearer-token auth, token at ~/.clawdcursor/token).

Endpoint	Method	Purpose
`/tools`	GET	Full catalog in OpenAI function-calling format. `?mode=compact` for the 6-tool surface.
`/execute/:name`	POST	Execute a tool by name. Returns structured JSON.
`/status`	GET	Current skill state.
`/screenshot`	GET	Current screen as PNG.
`/confirm`	POST	Approve or reject a safety-gated action.
`/abort`	POST	Stop the in-flight task.
`/health`	GET	Version, uptime, and health check.

Platform Support

Platform-specific code lives in src/v2/platform/{windows,macos,linux}.ts behind a single PlatformAdapter interface. Business logic never reads process.platform.

Platform	UI Automation	OCR	Browser
Windows x64 / ARM64	UI Automation via PowerShell bridge	`Windows.Media.Ocr`	Chrome / Edge (CDP)
macOS Intel / Apple Silicon	JXA + System Events (TCC-safe)	Apple Vision	Chrome / Edge (CDP)
Linux X11	AT-SPI + nut-js	Tesseract	Chrome / Edge (CDP)
Linux Wayland	AT-SPI + `ydotool` / `wtype`	Tesseract	Chrome / Edge (CDP)

Prerequisites

Node.js 20 or newer
macOS — Xcode CLI tools (xcode-select --install), then clawdcursor grant for Accessibility + Screen Recording
Linux — tesseract-ocr, python3-gi + gir1.2-atspi-2.0 (AT-SPI typelib), ydotool or wtype (Wayland)
AI provider key — configured on the agent side; the skill itself is model-agnostic

Testing and Troubleshooting

The CLI below is intended for humans diagnosing an install. Agents should not invoke it; they should use MCP or the REST surface.

clawdcursor doctor       Diagnose install, permissions, and platform bridges
clawdcursor grant        Grant macOS permissions (interactive)
clawdcursor consent      Manage desktop-control consent (--accept / --revoke / --status)
clawdcursor status       Check readiness (consent, permissions, AI config)
clawdcursor mcp          MCP stdio server (the primary skill transport)
clawdcursor serve        REST-only tool server (bring-your-own-agent)
clawdcursor stop         Stop every running mode (mcp, serve, start)

# The web dashboard is reachable at http://127.0.0.1:3847 while
# `clawdcursor serve` (or `start`) is running — no separate command.

# The two commands below exist for manual end-to-end testing only.
# Real agents should not use these — they should call the skill through MCP.
clawdcursor start        Run the built-in autonomous agent (testing)
clawdcursor task <t>     Send a task to that agent (testing)

Options:
  --port <port>          Default: 3847 (start, serve, stop, task)
  --compact              MCP only: expose 6 compound tools instead of 75 granular.
                         For REST/serve, use the `?mode=compact` query parameter
                         on `GET /tools` instead.
  --provider <name>      `start` only: anthropic | openai | gemini | ollama | ...
  --accept               `start` and `consent` only: skip the consent prompt.
                         For `serve`, use `--skip-consent` (dev environments).

Tech Stack

TypeScript · Node.js 20+ · nut-js · Playwright · sharp · Express · Model Context Protocol SDK · Zod · commander

License

MIT — see LICENSE.

clawdcursor.com

Name		Name	Last commit message	Last commit date
Latest commit History 437 Commits
.github		.github
docs		docs
guides		guides
native		native
perf		perf
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
schema.snapshot.json		schema.snapshot.json
tsconfig.json		tsconfig.json
tsconfig.tests.json		tsconfig.tests.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clawd Cursor

What This Is

Latest Release

Install the Skill

Windows

macOS

Linux

Connect Your Agent

Claude Code

OpenClaw

Cursor, Windsurf, Zed

Claude Agent SDK / bring-your-own-model

Tool Surface

Compact — 6 compound tools (recommended)

Granular — 75 individual tools

How the Skill Thinks

Safety

API

Platform Support

Prerequisites

Testing and Troubleshooting

Tech Stack

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clawd Cursor

What This Is

Latest Release

Install the Skill

Windows

macOS

Linux

Connect Your Agent

Claude Code

OpenClaw

Cursor, Windsurf, Zed

Claude Agent SDK / bring-your-own-model

Tool Surface

Compact — 6 compound tools (recommended)

Granular — 75 individual tools

How the Skill Thinks

Safety

API

Platform Support

Prerequisites

Testing and Troubleshooting

Tech Stack

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages