Vision
Today the viewer is a passive monitor. The vision is a two-headed orchestrator:
- A coding agent (Claude) drives work through
objectiveai-cli.
- A human drives the same work through
objectiveai-viewer.
- Both surfaces see the same swarms / executions / inventions / laboratories in real time, with a synchronized model of state.
- The human can spawn, message, intervene, and approve from the GUI; the agent sees those actions land in its CLI session and responds.
- The viewer becomes "pretty for the human user" — visually rich, dense with detail, designed for sustained human attention rather than one-off debugging.
This issue is the umbrella RFC. Sub-issues will be filed per phase.
Where we are today
objectiveai-viewer (objectiveai-viewer/src-tauri/src/run.rs, objectiveai-viewer/src/App.tsx)
- Axum server listens on 4 POST endpoints:
/agent/completions, /functions/executions, /functions/inventions/recursive, /laboratories/executions.
- HMAC-SHA256 (
X-VIEWER-SIGNATURE) auth gate.
- Events are buffered into a tokio
mpsc channel, then re-emitted as Tauri IPC events to React. Single Tauri command exists: viewer_ready — a one-shot signal, not a control surface.
- React renders read-only views from streamed chunks. State is in-memory only.
objectiveai-cli (objectiveai-cli/src/api/run.rs, objectiveai-cli/src/viewer/mod.rs)
- When
api.mode=Local + viewer.mode=Local, CLI extracts the embedded viewer binary, spawns it as a subprocess, parses listening on <addr> from its stderr, then injects X-VIEWER-ADDRESS + X-VIEWER-SIGNATURE headers so the API forwards events to the viewer over HTTP.
- All CLI output is JSON to stdout; logs are persisted under
$CONFIG_BASE_DIR/.objectiveai/<domain>/<op>/<id>.json.
objectiveai logs subscribe <id> already exposes a tail mechanism.
objectiveai-api
- SSE streaming, no WebSocket, no upstream channel from the viewer back into a running execution.
objectiveai-mcp-proxy/src/session_manager.rs already has the patterns we want elsewhere: UUIDv4 session ids, broadcast::Sender for fan-out, in-flight cancellation tokens, notification buffering.
Gap summary
- Wire is one-way. Viewer cannot send anything back to CLI/API.
- No "session" concept binding agent + human + executions into one orchestration.
- No human-in-the-loop primitives (ask/approve/interrupt/inject).
- No persistence — closing the viewer loses everything.
- No multi-agent / swarm coordination view; no vector vote/score visualization.
- No "spawn" UI — the human cannot start work, only watch the agent start it.
Proposed phasing
Each phase ships independently and is useful on its own. Sub-issues will track each.
Phase 1 — Bidirectional channel + session model
Goal: viewer can talk back; both sides agree on what "the run" is.
- Define a
SessionId (UUIDv4) minted by whichever side starts the orchestration. Threaded through the CLI HttpClient headers (new X-OBJECTIVEAI-SESSION-ID) and into every request body persisted by the API/filesystem.
- Add a viewer→CLI control channel. Two candidates (decision below):
- WebSocket on the viewer's existing axum server (
/control upgrade), CLI dials in once it knows the viewer address.
- HTTP long-poll on a new
/control/poll endpoint the CLI hits on a dedicated tokio task.
- Promote viewer Tauri commands beyond
viewer_ready: send_control(action), cancel(session_id), list_sessions().
- Persist every event the viewer receives to a local store under
$CONFIG_BASE_DIR/.objectiveai/viewer/sessions/<id>/ so reload restores state.
Phase 2 — Human can drive: spawn + message
Goal: the human is no longer a passenger.
- New viewer UI: "Start" panel — pick an agent, swarm, function execution profile, or laboratory; fill params; click run. Mirrors the CLI subcommand tree (
objectiveai agents, swarms, functions executions, functions inventions, laboratories).
- The viewer issues these requests itself directly to the local API (when
api.mode=Local), or routes them through the CLI (when only the CLI has credentials/config).
- "Send message" affordance during a streaming agent completion: viewer pushes a user message into the in-flight conversation. Requires API support for mid-stream injection (new endpoint
/agent/completions/{id}/inject or a control message over the same SSE).
- "Cancel" / "pause" buttons: piggyback on the existing in-flight cancellation token pattern from
objectiveai-mcp-proxy/src/session_manager.rs.
Phase 3 — Rich visualizations
Goal: actually pretty.
- Multi-pane layout: left = session/run tree, center = active stream (chat, tool calls, reasoning, tokens/$ counter), right = artifacts (function tree, vector votes, lab results).
- Function execution tree as an interactive graph (extend
objectiveai-viewer/src/FunctionExecutionView.tsx) — collapsible nodes, hoverable retry tokens, jump-to-source.
- Vector completions: render the per-voter scores and tally as a heatmap/bar; expose the merge in real time.
- Recursive inventions: stage-by-stage timeline (already partially in
objectiveai-viewer/src/FunctionInventionRecursiveView.tsx — extend with diffs between iterations).
- Theme + design pass. Pick a component library (Radix/shadcn or keep handcrafted) — decision below.
Phase 4 — Multi-agent, replay, share
Goal: full orchestrator.
- Multiple concurrent CLI agents register against the same viewer instance via
SessionId. Viewer shows them side-by-side; cross-agent message routing.
- Human-as-tool: register a synthetic MCP server in the viewer that exposes tools like
ask_human(question), approve(plan), pick_option(choices). The agent calls these via objectiveai-mcp-proxy; the viewer surfaces them as modals with a reply that becomes the tool result. Plumb through objectiveai-mcp-proxy/src/session.rs.
- Replay: re-open a persisted session, scrub a timeline, fork a new run from any point.
- Export/share: serialize a session to a single JSON for issue attachments, bug repros, post-mortems.
Key technical decisions (need consensus before Phase 1)
- Transport for viewer↔CLI control. WebSocket on viewer's axum (clean, one server) vs. CLI exposing its own control socket vs. long-poll. Recommendation: WebSocket on viewer.
- Session ID ownership. Minted by CLI on first command and propagated, vs. minted by viewer when the human "starts" something, vs. either-side mints with reconciliation. Recommendation: either-side mints, header-propagated, server-side reconciles.
- Spawn routing. When the human clicks "run agent" in viewer: (a) viewer calls API directly; (b) viewer asks CLI to run the command on its behalf so logs/favorites/config stay CLI-owned; (c) both, depending on
api.mode. Recommendation: (c).
- Persistence layer. SQLite (richer queries, harder Tauri build) vs. sled (pure Rust, simpler) vs. flat JSON files mirroring the CLI's existing log layout. Recommendation: mirror CLI logs, plus a thin index.
- Where human-as-tool lives. Viewer-embedded MCP server vs. extension of
objectiveai-mcp-proxy with a "human" upstream type. Affects how agents discover the tools.
- Frontend stack. Stay vanilla React + manual CSS, or adopt shadcn/Radix + Tailwind. Affects velocity for Phase 3.
- Auth model when viewer drives writes. Today HMAC is one-way (CLI signs, viewer verifies). Viewer→API and viewer→CLI need their own auth. Reuse the same secret, or mint per-direction keys?
Open questions
- Does "the human spawns a new execution mid-run" attach to the agent's existing session, or fork? Both have valid use cases.
- Should the viewer expose the CLI's full subcommand surface, or a curated subset? (Full = power, mirror divergence cost. Curated = forking a parallel UX.)
- How do we handle the case where the agent is on a remote machine and the human is on a laptop? Today's local-spawn assumption breaks. Phase-gate, or design for it from Phase 1?
- Cost / token live counters per session — push from API, or compute viewer-side from the chunks we already see?
- When a session is replayed, do we re-fire side-effecting tool calls or only show their recorded results? (Pretty sure: only show.)
Out of scope (for this umbrella issue)
- Cloud-hosted multi-user collaboration (real-time mouse cursors, presence, comments). Viewer is single-user local.
- Auth/identity beyond shared-secret HMAC.
- Mobile / web-only build of the viewer. Tauri desktop only.
- Plugin system / third-party panels.
- Schema-level changes to agent/swarm/function/laboratory request shapes — we render and orchestrate what already exists.
- Replacing the CLI. The CLI remains the source of truth for headless / agent use.
Files most likely to be touched
objectiveai-viewer/src-tauri/src/run.rs — extend axum, add WebSocket / control routes, add persistence hooks, broaden Tauri commands beyond viewer_ready.
objectiveai-viewer/src/App.tsx (and siblings AgentCompletionView.tsx, FunctionExecutionView.tsx, FunctionInventionRecursiveView.tsx) — restructure for multi-pane, add control affordances, wire invoke() for new Tauri commands.
objectiveai-cli/src/api/run.rs — open the viewer→CLI control connection after viewer spawn, thread SessionId through HTTP client / API config builders.
objectiveai-cli/src/viewer/mod.rs — new viewer subcommands (e.g. attach, sessions list).
objectiveai-mcp-proxy/src/session_manager.rs / objectiveai-mcp-proxy/src/session.rs — model to copy; possibly extend if we host a "human" MCP upstream.
objectiveai-api/src/... — mid-stream inject endpoint, session-id-aware request logging.
Acceptance for the umbrella
This issue is "done" when each of the four phases has its own tracking issue, the seven decisions above are resolved with a written rationale, and Phase 1 has merged behind a feature flag.
Vision
Today the viewer is a passive monitor. The vision is a two-headed orchestrator:
objectiveai-cli.objectiveai-viewer.This issue is the umbrella RFC. Sub-issues will be filed per phase.
Where we are today
objectiveai-viewer(objectiveai-viewer/src-tauri/src/run.rs,objectiveai-viewer/src/App.tsx)/agent/completions,/functions/executions,/functions/inventions/recursive,/laboratories/executions.X-VIEWER-SIGNATURE) auth gate.mpscchannel, then re-emitted as Tauri IPC events to React. Single Tauri command exists:viewer_ready— a one-shot signal, not a control surface.objectiveai-cli(objectiveai-cli/src/api/run.rs,objectiveai-cli/src/viewer/mod.rs)api.mode=Local+viewer.mode=Local, CLI extracts the embedded viewer binary, spawns it as a subprocess, parseslistening on <addr>from its stderr, then injectsX-VIEWER-ADDRESS+X-VIEWER-SIGNATUREheaders so the API forwards events to the viewer over HTTP.$CONFIG_BASE_DIR/.objectiveai/<domain>/<op>/<id>.json.objectiveai logs subscribe <id>already exposes a tail mechanism.objectiveai-apiobjectiveai-mcp-proxy/src/session_manager.rsalready has the patterns we want elsewhere: UUIDv4 session ids,broadcast::Senderfor fan-out, in-flight cancellation tokens, notification buffering.Gap summary
Proposed phasing
Each phase ships independently and is useful on its own. Sub-issues will track each.
Phase 1 — Bidirectional channel + session model
Goal: viewer can talk back; both sides agree on what "the run" is.
SessionId(UUIDv4) minted by whichever side starts the orchestration. Threaded through the CLIHttpClientheaders (newX-OBJECTIVEAI-SESSION-ID) and into every request body persisted by the API/filesystem./controlupgrade), CLI dials in once it knows the viewer address./control/pollendpoint the CLI hits on a dedicated tokio task.viewer_ready:send_control(action),cancel(session_id),list_sessions().$CONFIG_BASE_DIR/.objectiveai/viewer/sessions/<id>/so reload restores state.Phase 2 — Human can drive: spawn + message
Goal: the human is no longer a passenger.
objectiveai agents,swarms,functions executions,functions inventions,laboratories).api.mode=Local), or routes them through the CLI (when only the CLI has credentials/config)./agent/completions/{id}/injector a control message over the same SSE).objectiveai-mcp-proxy/src/session_manager.rs.Phase 3 — Rich visualizations
Goal: actually pretty.
objectiveai-viewer/src/FunctionExecutionView.tsx) — collapsible nodes, hoverable retry tokens, jump-to-source.objectiveai-viewer/src/FunctionInventionRecursiveView.tsx— extend with diffs between iterations).Phase 4 — Multi-agent, replay, share
Goal: full orchestrator.
SessionId. Viewer shows them side-by-side; cross-agent message routing.ask_human(question),approve(plan),pick_option(choices). The agent calls these viaobjectiveai-mcp-proxy; the viewer surfaces them as modals with a reply that becomes the tool result. Plumb throughobjectiveai-mcp-proxy/src/session.rs.Key technical decisions (need consensus before Phase 1)
api.mode. Recommendation: (c).objectiveai-mcp-proxywith a "human" upstream type. Affects how agents discover the tools.Open questions
Out of scope (for this umbrella issue)
Files most likely to be touched
objectiveai-viewer/src-tauri/src/run.rs— extend axum, add WebSocket / control routes, add persistence hooks, broaden Tauri commands beyondviewer_ready.objectiveai-viewer/src/App.tsx(and siblingsAgentCompletionView.tsx,FunctionExecutionView.tsx,FunctionInventionRecursiveView.tsx) — restructure for multi-pane, add control affordances, wireinvoke()for new Tauri commands.objectiveai-cli/src/api/run.rs— open the viewer→CLI control connection after viewer spawn, threadSessionIdthrough HTTP client / API config builders.objectiveai-cli/src/viewer/mod.rs— newviewersubcommands (e.g.attach,sessions list).objectiveai-mcp-proxy/src/session_manager.rs/objectiveai-mcp-proxy/src/session.rs— model to copy; possibly extend if we host a "human" MCP upstream.objectiveai-api/src/...— mid-stream inject endpoint, session-id-aware request logging.Acceptance for the umbrella
This issue is "done" when each of the four phases has its own tracking issue, the seven decisions above are resolved with a written rationale, and Phase 1 has merged behind a feature flag.