A VSCode extension that makes the whole system visible and drivable without leaving the editor. Sidebar TreeView for status at a glance. A WebviewPanel with eleven tabs, headlined by Studio: one hub that takes a workflow from discovery through configuration, cost estimate, launch, and live observation to the findings report. Zero runtime dependencies, fully theme-aware.
Studio consolidates workflows, runs, audits, models, and agents into a single tab organised around the life of a run, so an operator can drive a workflow end to end without touching a terminal. The same run used to appear in three places at three levels of detail; now there is one path through it. Everything is read-only or observe-only except two deliberate actions: the launch itself, behind an explicit spend confirmation, and the per-workflow default-model editor.
flowchart LR
D["Discover
workflow catalogue"] --> C["Configure
models, verify,
challenge, targets"]
C --> E["Estimate
cost priced from
run history"]
E --> L["Launch
explicit spend
confirmation"]
L --> O["Observe
live runs +
graph view"]
O --> U["Understand
node drill-through,
findings, outcomes"]
%% Blueprint tokens as literal hex (Mermaid cannot parse CSS vars or color-mix)
classDef step fill:#62d99a1f,stroke:#62d99a,color:#dfe4ee,stroke-width:1.5px;
classDef spend fill:#e0be621f,stroke:#e0be62,color:#dfe4ee,stroke-width:1.5px;
class D,C,E,O,U step;
class L spend;
The workflow detail computes a live cost estimate: each stage's historical average token usage, priced at the currently selected model, times the number of enrolled targets. Change a model, toggle verify or challenge, or narrow the target list and the estimate recomputes. The actual average cost per run sits next to it for an honesty check.
While the Runs or Graph view is visible, a change-guarded 4-second poll advances node statuses and pops runs from Active to Completed. Renders are cache-guarded, so an idle tick never churns the DOM or closes an open drill-through.
Graph nodes badge the review outcome of their newest stage (approved or rejected), completed runs join cost, tokens, and models from the run metrics, and by-model donut charts break spend, requests, and tokens down per model for a workflow or a single run.
Clicking a fan-out node opens its node detail: the persisted findings report with a clear passed-or-failed verdict line, per-stage models, tokens and attempts, and a jump straight into the Audits view with that target preselected.
One StatusPoller instance runs at a configurable interval (default 4 seconds). It fetches from GET /status and scans the filesystem for skills, commands, memory files, and agents. Both surfaces consume the same data without maintaining separate state.
Always visible in the VSCode activity bar. Shows server status (online/offline badge), uptime, last memory sync time, and the portfolio file list. Clicking a portfolio file opens it in the editor. No interaction needed to keep it current: the poller drives updates automatically.
Opens on demand via the AgentOS activity bar or command palette. Full eleven-tab interface. Passive consumer of the StatusPoller state: opening the panel does not trigger additional requests. Layout fills the editor window and scrolls within the active tab only; status card and tab nav stay fixed.
A fixed status card sits above the tabs with live server metrics (uptime, restarts, per-model token breakdown) and pm2 Start, Stop, and Restart controls. The tabs below it cover the rest of the system.
| Tab | What it covers |
|---|---|
| Setup | Guided setup wizard for new machines. Three-step flow: validate prerequisites (Node.js, pm2, VSCode), configure environment variables, run initial agent setup. Client pill selectors for Claude, Codex, and Cursor let you step through each client's setup independently. Auto-closes on success. |
| Analytics | Per-client token usage breakdown. Client pill selectors toggle between Claude, Codex, and Cursor analytics. Session and all-time totals per model, with cost estimates from a shared pricing module covering Anthropic, OpenAI, and OpenRouter-hosted models (DeepSeek, Qwen, Kimi, GLM). Model columns render dynamically from the data: any model ID a client reports gets a column, not just the Claude trio. Tool invocation counts, request logs with method, resource, duration, and token columns. Skill invocation log with project and model tagging. Claude figures come from hook payloads; Codex and Cursor figures are best-effort estimates parsed from session logs. |
| Context | All ten context portfolio files listed with name, description, and last-read timestamp. Searchable by name. Clicking a row opens the file in the editor. Read count per file shows which portfolio resources are accessed most frequently across all clients. |
| Commands | Shared command registry across Claude, Codex, and Cursor. Shared skills deduplicated by name: commands present in multiple clients collapse to a single row with stacked tool badges. An "All" badge replaces individual badges when all three clients carry the command. Invocation log shows recent calls with client, timestamp, and duration. Run buttons invoke a command directly, and a Create wizard scaffolds new commands in the correct directory. Searchable and filterable by scope and orchestration type. |
| Studio | The runtime hub, in five views. Workflows: the catalogue and detail with configure-and-launch and the live cost estimate. Runs: active, queued, and completed in one place, refreshed live. Audits: run single-target code audits and website content and facts audits, and open their reports. Models: per-workflow stage-by-model breakdowns with a persistent default-models editor. Agents: the Claude sub-agent registry with run and remove actions. |
| Memory | Project memory entries from ~/.claude/projects/*/memory/ and Codex memory from ~/.codex/memories/. Tool badge distinguishes Claude vs Codex entries. Type badge (user, feedback, project, reference). Create Memory wizard with name, description, type, scope, and content fields, and it creates the file with correct YAML frontmatter. Remove button with confirmation. Searchable and filterable by type, scope, and description. |
| Hooks | Claude Code event hooks (before-tool-call, after-tool-call, on-session-stop, on-notification) with their matcher patterns and the command each one runs. Scope badges separate global and project hooks. |
| MCP | Registered MCP servers with name, transport, endpoint, scope, and live status. Non-Claude servers are probed on an interval, so a dead endpoint is visible immediately. |
| Files | Files shared with Claude sessions, with their sync state. Push/Pull controls per client: Deploy runs sync.ps1 push-runtime, Check Sync compares canonical and runtime copies and opens a file-by-file diff on mismatch. |
| Logs | Extension, server, pm2, MCP, and agent logs in a single stream, filterable by level and category. |
| Settings | Claude Code settings.json editor: permissions, environment variables, and hook configuration edited in the dashboard and written straight back to the file. |
The extension uses only the VSCode extension API, Node.js built-ins (http, child_process, fs), and the TypeScript compiler (dev only). No npm packages at runtime. No bundler. Packaged as a VSIX and installed via code.cmd --install-extension.
All colours are expressed as var(--vscode-*) CSS tokens. The webview reads the current theme automatically, with no hardcoded hex values. Nonce-based CSP on the webview prevents script injection.
The VSCode extension host does not inherit the shell PATH, so npm globals are not directly callable. child_process.exec calls use { shell: 'cmd.exe' } which resolves npm globals (vsce, code.cmd) correctly on Windows without requiring absolute paths in the source. The pm2 controls are the exception: they call spawn directly with no shell for safety.
Client colours are distinct and consistent across all tabs and badges: Claude = blue (#4d9cf8), Codex = green (#4db87a), Cursor = orange (#f0a44c). Applied to tool badges, client pills, and any per-client breakdowns in analytics.
All configurable under agentOsDashboard.* in VSCode settings.