Turn objectiveai-viewer into a GUI swarm orchestrator (interactive co-pilot for human + coding agent)

## Vision

Today the viewer is a passive monitor. The vision is a **two-headed orchestrator**:

- A coding agent (Claude) drives work through `objectiveai-cli`.
- A human drives the same work through `objectiveai-viewer`.
- Both surfaces see the same swarms / executions / inventions / laboratories in real time, with a synchronized model of state.
- The human can **spawn, message, intervene, and approve** from the GUI; the agent sees those actions land in its CLI session and responds.
- The viewer becomes "pretty for the human user" — visually rich, dense with detail, designed for sustained human attention rather than one-off debugging.

This issue is the umbrella RFC. Sub-issues will be filed per phase.

## Where we are today

**`objectiveai-viewer`** (`objectiveai-viewer/src-tauri/src/run.rs`, `objectiveai-viewer/src/App.tsx`)
- Axum server listens on 4 POST endpoints: `/agent/completions`, `/functions/executions`, `/functions/inventions/recursive`, `/laboratories/executions`.
- HMAC-SHA256 (`X-VIEWER-SIGNATURE`) auth gate.
- Events are buffered into a tokio `mpsc` channel, then re-emitted as Tauri IPC events to React. Single Tauri command exists: `viewer_ready` — a one-shot signal, not a control surface.
- React renders read-only views from streamed chunks. State is in-memory only.

**`objectiveai-cli`** (`objectiveai-cli/src/api/run.rs`, `objectiveai-cli/src/viewer/mod.rs`)
- When `api.mode=Local` + `viewer.mode=Local`, CLI extracts the embedded viewer binary, spawns it as a subprocess, parses `listening on <addr>` from its stderr, then injects `X-VIEWER-ADDRESS` + `X-VIEWER-SIGNATURE` headers so the API forwards events to the viewer over HTTP.
- All CLI output is JSON to stdout; logs are persisted under `$CONFIG_BASE_DIR/.objectiveai/<domain>/<op>/<id>.json`.
- `objectiveai logs subscribe <id>` already exposes a tail mechanism.

**`objectiveai-api`**
- SSE streaming, no WebSocket, no upstream channel from the viewer back into a running execution.
- `objectiveai-mcp-proxy/src/session_manager.rs` already has the patterns we want elsewhere: UUIDv4 session ids, `broadcast::Sender` for fan-out, in-flight cancellation tokens, notification buffering.

**Gap summary**
1. Wire is one-way. Viewer cannot send anything back to CLI/API.
2. No "session" concept binding agent + human + executions into one orchestration.
3. No human-in-the-loop primitives (ask/approve/interrupt/inject).
4. No persistence — closing the viewer loses everything.
5. No multi-agent / swarm coordination view; no vector vote/score visualization.
6. No "spawn" UI — the human cannot start work, only watch the agent start it.

## Proposed phasing

Each phase ships independently and is useful on its own. Sub-issues will track each.

### Phase 1 — Bidirectional channel + session model

Goal: viewer can talk back; both sides agree on what "the run" is.

- Define a `SessionId` (UUIDv4) minted by whichever side starts the orchestration. Threaded through the CLI `HttpClient` headers (new `X-OBJECTIVEAI-SESSION-ID`) and into every request body persisted by the API/filesystem.
- Add a viewer→CLI control channel. Two candidates (decision below):
  - WebSocket on the viewer's existing axum server (`/control` upgrade), CLI dials in once it knows the viewer address.
  - HTTP long-poll on a new `/control/poll` endpoint the CLI hits on a dedicated tokio task.
- Promote viewer Tauri commands beyond `viewer_ready`: `send_control(action)`, `cancel(session_id)`, `list_sessions()`.
- Persist every event the viewer receives to a local store under `$CONFIG_BASE_DIR/.objectiveai/viewer/sessions/<id>/` so reload restores state.

### Phase 2 — Human can drive: spawn + message

Goal: the human is no longer a passenger.

- New viewer UI: "Start" panel — pick an agent, swarm, function execution profile, or laboratory; fill params; click run. Mirrors the CLI subcommand tree (`objectiveai agents`, `swarms`, `functions executions`, `functions inventions`, `laboratories`).
- The viewer issues these requests itself directly to the local API (when `api.mode=Local`), or routes them through the CLI (when only the CLI has credentials/config).
- "Send message" affordance during a streaming agent completion: viewer pushes a user message into the in-flight conversation. Requires API support for mid-stream injection (new endpoint `/agent/completions/{id}/inject` or a control message over the same SSE).
- "Cancel" / "pause" buttons: piggyback on the existing in-flight cancellation token pattern from `objectiveai-mcp-proxy/src/session_manager.rs`.

### Phase 3 — Rich visualizations

Goal: actually pretty.

- Multi-pane layout: left = session/run tree, center = active stream (chat, tool calls, reasoning, tokens/$ counter), right = artifacts (function tree, vector votes, lab results).
- Function execution tree as an interactive graph (extend `objectiveai-viewer/src/FunctionExecutionView.tsx`) — collapsible nodes, hoverable retry tokens, jump-to-source.
- Vector completions: render the per-voter scores and tally as a heatmap/bar; expose the merge in real time.
- Recursive inventions: stage-by-stage timeline (already partially in `objectiveai-viewer/src/FunctionInventionRecursiveView.tsx` — extend with diffs between iterations).
- Theme + design pass. Pick a component library (Radix/shadcn or keep handcrafted) — decision below.

### Phase 4 — Multi-agent, replay, share

Goal: full orchestrator.

- Multiple concurrent CLI agents register against the same viewer instance via `SessionId`. Viewer shows them side-by-side; cross-agent message routing.
- Human-as-tool: register a synthetic MCP server in the viewer that exposes tools like `ask_human(question)`, `approve(plan)`, `pick_option(choices)`. The agent calls these via `objectiveai-mcp-proxy`; the viewer surfaces them as modals with a reply that becomes the tool result. Plumb through `objectiveai-mcp-proxy/src/session.rs`.
- Replay: re-open a persisted session, scrub a timeline, fork a new run from any point.
- Export/share: serialize a session to a single JSON for issue attachments, bug repros, post-mortems.

## Key technical decisions (need consensus before Phase 1)

1. **Transport for viewer↔CLI control.** WebSocket on viewer's axum (clean, one server) vs. CLI exposing its own control socket vs. long-poll. Recommendation: WebSocket on viewer.
2. **Session ID ownership.** Minted by CLI on first command and propagated, vs. minted by viewer when the human "starts" something, vs. either-side mints with reconciliation. Recommendation: either-side mints, header-propagated, server-side reconciles.
3. **Spawn routing.** When the human clicks "run agent" in viewer: (a) viewer calls API directly; (b) viewer asks CLI to run the command on its behalf so logs/favorites/config stay CLI-owned; (c) both, depending on `api.mode`. Recommendation: (c).
4. **Persistence layer.** SQLite (richer queries, harder Tauri build) vs. sled (pure Rust, simpler) vs. flat JSON files mirroring the CLI's existing log layout. Recommendation: mirror CLI logs, plus a thin index.
5. **Where human-as-tool lives.** Viewer-embedded MCP server vs. extension of `objectiveai-mcp-proxy` with a "human" upstream type. Affects how agents discover the tools.
6. **Frontend stack.** Stay vanilla React + manual CSS, or adopt shadcn/Radix + Tailwind. Affects velocity for Phase 3.
7. **Auth model when viewer drives writes.** Today HMAC is one-way (CLI signs, viewer verifies). Viewer→API and viewer→CLI need their own auth. Reuse the same secret, or mint per-direction keys?

## Open questions

- Does "the human spawns a new execution mid-run" attach to the agent's existing session, or fork? Both have valid use cases.
- Should the viewer expose the CLI's full subcommand surface, or a curated subset? (Full = power, mirror divergence cost. Curated = forking a parallel UX.)
- How do we handle the case where the agent is on a remote machine and the human is on a laptop? Today's local-spawn assumption breaks. Phase-gate, or design for it from Phase 1?
- Cost / token live counters per session — push from API, or compute viewer-side from the chunks we already see?
- When a session is replayed, do we re-fire side-effecting tool calls or only show their recorded results? (Pretty sure: only show.)

## Out of scope (for this umbrella issue)

- Cloud-hosted multi-user collaboration (real-time mouse cursors, presence, comments). Viewer is single-user local.
- Auth/identity beyond shared-secret HMAC.
- Mobile / web-only build of the viewer. Tauri desktop only.
- Plugin system / third-party panels.
- Schema-level changes to agent/swarm/function/laboratory request shapes — we render and orchestrate what already exists.
- Replacing the CLI. The CLI remains the source of truth for headless / agent use.

## Files most likely to be touched

- `objectiveai-viewer/src-tauri/src/run.rs` — extend axum, add WebSocket / control routes, add persistence hooks, broaden Tauri commands beyond `viewer_ready`.
- `objectiveai-viewer/src/App.tsx` (and siblings `AgentCompletionView.tsx`, `FunctionExecutionView.tsx`, `FunctionInventionRecursiveView.tsx`) — restructure for multi-pane, add control affordances, wire `invoke()` for new Tauri commands.
- `objectiveai-cli/src/api/run.rs` — open the viewer→CLI control connection after viewer spawn, thread `SessionId` through HTTP client / API config builders.
- `objectiveai-cli/src/viewer/mod.rs` — new `viewer` subcommands (e.g. `attach`, `sessions list`).
- `objectiveai-mcp-proxy/src/session_manager.rs` / `objectiveai-mcp-proxy/src/session.rs` — model to copy; possibly extend if we host a "human" MCP upstream.
- `objectiveai-api/src/...` — mid-stream inject endpoint, session-id-aware request logging.

## Acceptance for the umbrella

This issue is "done" when each of the four phases has its own tracking issue, the seven decisions above are resolved with a written rationale, and Phase 1 has merged behind a feature flag.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn objectiveai-viewer into a GUI swarm orchestrator (interactive co-pilot for human + coding agent) #171

Vision

Where we are today

Proposed phasing

Phase 1 — Bidirectional channel + session model

Phase 2 — Human can drive: spawn + message

Phase 3 — Rich visualizations

Phase 4 — Multi-agent, replay, share

Key technical decisions (need consensus before Phase 1)

Open questions

Out of scope (for this umbrella issue)

Files most likely to be touched

Acceptance for the umbrella

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Turn objectiveai-viewer into a GUI swarm orchestrator (interactive co-pilot for human + coding agent) #171

Description

Vision

Where we are today

Proposed phasing

Phase 1 — Bidirectional channel + session model

Phase 2 — Human can drive: spawn + message

Phase 3 — Rich visualizations

Phase 4 — Multi-agent, replay, share

Key technical decisions (need consensus before Phase 1)

Open questions

Out of scope (for this umbrella issue)

Files most likely to be touched

Acceptance for the umbrella

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions