Skip to content

fix(cursor): per-project breakdown by workspace (closes per-project half of #196)#296

Merged
iamtoruk merged 2 commits intomainfrom
feat/cursor-per-project-breakdown
May 10, 2026
Merged

fix(cursor): per-project breakdown by workspace (closes per-project half of #196)#296
iamtoruk merged 2 commits intomainfrom
feat/cursor-per-project-breakdown

Conversation

@iamtoruk
Copy link
Copy Markdown
Member

Third and final PR addressing #196 / #159. Closes the per-project breakdown half of #196 (the activity-classifier half shipped in #289, the model-alias half shipped in #290).

The bug

PhilippMolitor's report in #196 showed every Cursor session under one row labeled cursor with 870 calls in "1 session". Cursor's global SQLite stores per-bubble token data but not per-bubble workspace, so the old provider had nothing to bucket on.

The fix

Three layers, all in src/providers/cursor.ts:

  1. Workspace map. Walk ~/Library/Application Support/Cursor/User/workspaceStorage/<hash>/. For each hash, read workspace.json (folder URI) and state.vscdb's ItemTable['composer.composerData'] (the composer ids opened in that workspace). Build Map<composerId, folderUri> once and memoize per CLI run.
  2. One source per workspace + an orphan source. discoverSessions emits one SessionSource per workspace with project name sanitizeWorkspaceUri(folderUri) (matches Claude's slug shape: file:///Users/me/proj -> -Users-me-proj). Plus a catch-all source labeled cursor that captures composers not registered in any workspace (multi-root, "no folder open", deleted workspaces).
  3. Source-aware parser filter. The encoded source.path carries the workspace tag via #cursor-ws=.... The parser decodes it, parses the global db once (cached across all workspace-scoped sources), then yields only the composers belonging to this source.

Two side-effect bug fixes that fell out:

  • sessionId was always 'unknown' for every Cursor call. The JSON conversationId field on bubbles is empty in current Cursor builds; the real composer id is in the row key (bubbleId:<composerId>:<bubbleUuid>). The old code masked the bug because every call collapsed under a single cursor row anyway.
  • Cursor stores tool-call sub-composer rows under keys like bubbleId:task-call_xxx\nfc_yyy:<bubbleUuid> with a literal newline in the composer segment. These are not standalone composers and would otherwise inflate the orphan project's session count. parseComposerIdFromKey now rejects any composer segment containing CR/LF.

Multi-agent + devil's advocate review caught and fixed before push

  • Sub-composer keys with embedded newlines — initial implementation split them on : and produced mangled session ids that landed in orphan. Now filtered explicitly. New test fixture for the real key shape.
  • daily-cache.ts version bump missed — bumped to 5. Without this, the 30-day dashboard would mix old cursor rows with new per-workspace rows for the historical window.
  • Unused basename import — removed.
  • Misleading variable name allowedComposers on the orphan branch (it was actually the disallowed set) — split into composerFilter + explicit filterMode: 'include' | 'exclude'.
  • Comment claim "# is illegal in POSIX paths" — softened. # is legal in POSIX file names; what matters is that state.vscdb paths don't contain it because we construct them ourselves.

Live verification

Against my real 1.9 GB Cursor DB (5,556 bubble rows, 12 workspace directories):

overview.calls = 1904  =  sum(projects.calls) = 642 + 372 + 516 + 331 + 42 + 1
overview.cost  = $4.08 =  sum(projects.cost)  = 1.4963 + 0.9149 + 0.8998 + 0.5105 + 0.2600 + 0.000063

Projects: ContentFlow (642 / $1.50), llm (372 / $0.91), autogen-team (516 / $0.90), cursor / orphan (331 / $0.51), Public (42 / $0.26), TweeterGPTTraining (1 / $0.0001).

Tests

9 fixture-based tests in tests/providers/cursor-workspace-breakdown.test.ts:

  • one source per workspace plus orphan
  • routes calls to the right workspace
  • orphan source captures unmapped composers
  • totals across all sources equal legacy total
  • single cursor source when no workspaceStorage exists
  • multi-root workspace (no folder in workspace.json) skipped
  • vscode-remote URI sanitization
  • sub-composer keys (with embedded newline) dropped, not surfaced as sessions
  • legacy bare DB path still works

Full suite: 46 files, 654 tests passing.

Out of scope

  • The cursor-agent provider (separate provider) — uses different schema, not affected.
  • Workspace map cache invalidation in long-lived menubar processes — the menubar's own 30s refresh recreates the CLI subprocess so the map rebuilds. Adding a TTL within one process is a follow-up if anyone notices stale data.

Test plan

  • rm -f ~/.cache/codeburn/cursor-results.json && node dist/cli.js report --provider cursor -p all should produce one row per Cursor workspace plus a cursor row for the catch-all.
  • Totals (calls, cost) should reconcile exactly between the overview and the sum of per-project rows.
  • On a fresh install with no workspaceStorage, dashboard should look identical to before (one cursor row).

iamtoruk added 2 commits May 10, 2026 04:06
…#196)

Cursor's chat history showed as a single row labeled 'cursor' in
the dashboard because the global state.vscdb has no workspace
field on individual bubbles. The fix joins through Cursor's
per-workspace storage:

1. Walk ~/Library/Application Support/Cursor/User/workspaceStorage/*
2. For each hash dir, read workspace.json -> folder URI
3. Open that dir's state.vscdb, read
   ItemTable['composer.composerData'] -> allComposers list
4. Build Map<composerId, folder URI>
5. emit one SessionSource per workspace plus a catch-all 'cursor'
   source for composers that did not register against any
   workspace (multi-root workspaces, no-folder-open windows,
   deleted workspaces with surviving global rows)

The parser decodes source.path's #cursor-ws= tag, filters the
parsed bubbles to the composerIds that belong to this workspace,
and yields only those. The orphan-tag source negates the filter so
it captures every composer not in any workspace.

In passing, fix a real bug in the old code: parseBubbles set
`sessionId: row.conversation_id ?? 'unknown'`, but the JSON
`conversationId` field is empty in current Cursor builds, so every
call shipped with `sessionId: 'unknown'`. We now derive the
composer id from the row key (`bubbleId:<composerId>:<bubbleUuid>`)
which is what the workspace map joins on. The old behavior masked
the bug because every call went into a single 'cursor' project
anyway; with per-workspace bucketing the bug becomes load-bearing.
Cache version bumped 2 -> 3 to invalidate caches that still record
'unknown' as the session id.

Live-tested against my real 1.9 GB Cursor DB: the single 'cursor'
row with 1904 calls / $4.08 now breaks into 5 workspaces plus an
orphan bucket, totals reconcile exactly. 8 fixture-based tests
cover multi-workspace routing, orphan filtering, legacy bare DB
path backwards compat, multi-root workspace skip, vscode-remote
URI slugification, and total reconciliation across all sources.

Full suite: 46 files, 653 tests passing.
@iamtoruk iamtoruk merged commit 810b214 into main May 10, 2026
3 checks passed
@iamtoruk iamtoruk deleted the feat/cursor-per-project-breakdown branch May 10, 2026 22:35
iamtoruk added a commit that referenced this pull request May 10, 2026
…n) (#297)

PR #296 (Cursor per-project breakdown) bumped DAILY_CACHE_VERSION
from 4 to 5 but left MIN_SUPPORTED_VERSION at 2. The migration
path (isMigratableCache + migrateDays) only fills in missing
default fields; it does NOT recompute the providers / categories
/ models rollups from session data, because raw sessions are not
retained in the cache. So a v4 cache migrated to v5 carried
forward its old per-day provider totals (single 'cursor' bucket)
for the full retention window.

Effect on users post-#296: the macOS menubar's
`current.providers.cursor` would show the orphan-bucket subtotal
instead of the full Cursor cost for any historical day whose
daily entry was computed before #296 landed. Live-test on my
machine showed cursor=$3.78 against a migrated v4 cache vs
cursor=$4.08 (correct) after the daily cache was discarded — the
$0.30 gap was the workspace projects whose costs were no longer
aggregated under the 'cursor' label by the new code.

Fix: raise MIN_SUPPORTED_VERSION to 5 so any cache with
version < DAILY_CACHE_VERSION is renamed to `.bak` and the cache
is recomputed from scratch on next run. The recompute is the same
operation that backfills the cache for a new user, so the cost is
a one-time cold-path hit (~3s on the test machine).

Test for the migration case updated to assert the new
discard-and-bak behavior. Full suite: 46 files / 654 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant