Skip to content

Turn objectiveai-viewer into a GUI swarm orchestrator (interactive co-pilot for human + coding agent) #171

@WiggidyW

Description

@WiggidyW

Vision

Today the viewer is a passive monitor. The vision is a two-headed orchestrator:

  • A coding agent (Claude) drives work through objectiveai-cli.
  • A human drives the same work through objectiveai-viewer.
  • Both surfaces see the same swarms / executions / inventions / laboratories in real time, with a synchronized model of state.
  • The human can spawn, message, intervene, and approve from the GUI; the agent sees those actions land in its CLI session and responds.
  • The viewer becomes "pretty for the human user" — visually rich, dense with detail, designed for sustained human attention rather than one-off debugging.

This issue is the umbrella RFC. Sub-issues will be filed per phase.

Where we are today

objectiveai-viewer (objectiveai-viewer/src-tauri/src/run.rs, objectiveai-viewer/src/App.tsx)

  • Axum server listens on 4 POST endpoints: /agent/completions, /functions/executions, /functions/inventions/recursive, /laboratories/executions.
  • HMAC-SHA256 (X-VIEWER-SIGNATURE) auth gate.
  • Events are buffered into a tokio mpsc channel, then re-emitted as Tauri IPC events to React. Single Tauri command exists: viewer_ready — a one-shot signal, not a control surface.
  • React renders read-only views from streamed chunks. State is in-memory only.

objectiveai-cli (objectiveai-cli/src/api/run.rs, objectiveai-cli/src/viewer/mod.rs)

  • When api.mode=Local + viewer.mode=Local, CLI extracts the embedded viewer binary, spawns it as a subprocess, parses listening on <addr> from its stderr, then injects X-VIEWER-ADDRESS + X-VIEWER-SIGNATURE headers so the API forwards events to the viewer over HTTP.
  • All CLI output is JSON to stdout; logs are persisted under $CONFIG_BASE_DIR/.objectiveai/<domain>/<op>/<id>.json.
  • objectiveai logs subscribe <id> already exposes a tail mechanism.

objectiveai-api

  • SSE streaming, no WebSocket, no upstream channel from the viewer back into a running execution.
  • objectiveai-mcp-proxy/src/session_manager.rs already has the patterns we want elsewhere: UUIDv4 session ids, broadcast::Sender for fan-out, in-flight cancellation tokens, notification buffering.

Gap summary

  1. Wire is one-way. Viewer cannot send anything back to CLI/API.
  2. No "session" concept binding agent + human + executions into one orchestration.
  3. No human-in-the-loop primitives (ask/approve/interrupt/inject).
  4. No persistence — closing the viewer loses everything.
  5. No multi-agent / swarm coordination view; no vector vote/score visualization.
  6. No "spawn" UI — the human cannot start work, only watch the agent start it.

Proposed phasing

Each phase ships independently and is useful on its own. Sub-issues will track each.

Phase 1 — Bidirectional channel + session model

Goal: viewer can talk back; both sides agree on what "the run" is.

  • Define a SessionId (UUIDv4) minted by whichever side starts the orchestration. Threaded through the CLI HttpClient headers (new X-OBJECTIVEAI-SESSION-ID) and into every request body persisted by the API/filesystem.
  • Add a viewer→CLI control channel. Two candidates (decision below):
    • WebSocket on the viewer's existing axum server (/control upgrade), CLI dials in once it knows the viewer address.
    • HTTP long-poll on a new /control/poll endpoint the CLI hits on a dedicated tokio task.
  • Promote viewer Tauri commands beyond viewer_ready: send_control(action), cancel(session_id), list_sessions().
  • Persist every event the viewer receives to a local store under $CONFIG_BASE_DIR/.objectiveai/viewer/sessions/<id>/ so reload restores state.

Phase 2 — Human can drive: spawn + message

Goal: the human is no longer a passenger.

  • New viewer UI: "Start" panel — pick an agent, swarm, function execution profile, or laboratory; fill params; click run. Mirrors the CLI subcommand tree (objectiveai agents, swarms, functions executions, functions inventions, laboratories).
  • The viewer issues these requests itself directly to the local API (when api.mode=Local), or routes them through the CLI (when only the CLI has credentials/config).
  • "Send message" affordance during a streaming agent completion: viewer pushes a user message into the in-flight conversation. Requires API support for mid-stream injection (new endpoint /agent/completions/{id}/inject or a control message over the same SSE).
  • "Cancel" / "pause" buttons: piggyback on the existing in-flight cancellation token pattern from objectiveai-mcp-proxy/src/session_manager.rs.

Phase 3 — Rich visualizations

Goal: actually pretty.

  • Multi-pane layout: left = session/run tree, center = active stream (chat, tool calls, reasoning, tokens/$ counter), right = artifacts (function tree, vector votes, lab results).
  • Function execution tree as an interactive graph (extend objectiveai-viewer/src/FunctionExecutionView.tsx) — collapsible nodes, hoverable retry tokens, jump-to-source.
  • Vector completions: render the per-voter scores and tally as a heatmap/bar; expose the merge in real time.
  • Recursive inventions: stage-by-stage timeline (already partially in objectiveai-viewer/src/FunctionInventionRecursiveView.tsx — extend with diffs between iterations).
  • Theme + design pass. Pick a component library (Radix/shadcn or keep handcrafted) — decision below.

Phase 4 — Multi-agent, replay, share

Goal: full orchestrator.

  • Multiple concurrent CLI agents register against the same viewer instance via SessionId. Viewer shows them side-by-side; cross-agent message routing.
  • Human-as-tool: register a synthetic MCP server in the viewer that exposes tools like ask_human(question), approve(plan), pick_option(choices). The agent calls these via objectiveai-mcp-proxy; the viewer surfaces them as modals with a reply that becomes the tool result. Plumb through objectiveai-mcp-proxy/src/session.rs.
  • Replay: re-open a persisted session, scrub a timeline, fork a new run from any point.
  • Export/share: serialize a session to a single JSON for issue attachments, bug repros, post-mortems.

Key technical decisions (need consensus before Phase 1)

  1. Transport for viewer↔CLI control. WebSocket on viewer's axum (clean, one server) vs. CLI exposing its own control socket vs. long-poll. Recommendation: WebSocket on viewer.
  2. Session ID ownership. Minted by CLI on first command and propagated, vs. minted by viewer when the human "starts" something, vs. either-side mints with reconciliation. Recommendation: either-side mints, header-propagated, server-side reconciles.
  3. Spawn routing. When the human clicks "run agent" in viewer: (a) viewer calls API directly; (b) viewer asks CLI to run the command on its behalf so logs/favorites/config stay CLI-owned; (c) both, depending on api.mode. Recommendation: (c).
  4. Persistence layer. SQLite (richer queries, harder Tauri build) vs. sled (pure Rust, simpler) vs. flat JSON files mirroring the CLI's existing log layout. Recommendation: mirror CLI logs, plus a thin index.
  5. Where human-as-tool lives. Viewer-embedded MCP server vs. extension of objectiveai-mcp-proxy with a "human" upstream type. Affects how agents discover the tools.
  6. Frontend stack. Stay vanilla React + manual CSS, or adopt shadcn/Radix + Tailwind. Affects velocity for Phase 3.
  7. Auth model when viewer drives writes. Today HMAC is one-way (CLI signs, viewer verifies). Viewer→API and viewer→CLI need their own auth. Reuse the same secret, or mint per-direction keys?

Open questions

  • Does "the human spawns a new execution mid-run" attach to the agent's existing session, or fork? Both have valid use cases.
  • Should the viewer expose the CLI's full subcommand surface, or a curated subset? (Full = power, mirror divergence cost. Curated = forking a parallel UX.)
  • How do we handle the case where the agent is on a remote machine and the human is on a laptop? Today's local-spawn assumption breaks. Phase-gate, or design for it from Phase 1?
  • Cost / token live counters per session — push from API, or compute viewer-side from the chunks we already see?
  • When a session is replayed, do we re-fire side-effecting tool calls or only show their recorded results? (Pretty sure: only show.)

Out of scope (for this umbrella issue)

  • Cloud-hosted multi-user collaboration (real-time mouse cursors, presence, comments). Viewer is single-user local.
  • Auth/identity beyond shared-secret HMAC.
  • Mobile / web-only build of the viewer. Tauri desktop only.
  • Plugin system / third-party panels.
  • Schema-level changes to agent/swarm/function/laboratory request shapes — we render and orchestrate what already exists.
  • Replacing the CLI. The CLI remains the source of truth for headless / agent use.

Files most likely to be touched

  • objectiveai-viewer/src-tauri/src/run.rs — extend axum, add WebSocket / control routes, add persistence hooks, broaden Tauri commands beyond viewer_ready.
  • objectiveai-viewer/src/App.tsx (and siblings AgentCompletionView.tsx, FunctionExecutionView.tsx, FunctionInventionRecursiveView.tsx) — restructure for multi-pane, add control affordances, wire invoke() for new Tauri commands.
  • objectiveai-cli/src/api/run.rs — open the viewer→CLI control connection after viewer spawn, thread SessionId through HTTP client / API config builders.
  • objectiveai-cli/src/viewer/mod.rs — new viewer subcommands (e.g. attach, sessions list).
  • objectiveai-mcp-proxy/src/session_manager.rs / objectiveai-mcp-proxy/src/session.rs — model to copy; possibly extend if we host a "human" MCP upstream.
  • objectiveai-api/src/... — mid-stream inject endpoint, session-id-aware request logging.

Acceptance for the umbrella

This issue is "done" when each of the four phases has its own tracking issue, the seven decisions above are resolved with a written rationale, and Phase 1 has merged behind a feature flag.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions