Add assistant turn stats footer#2601
Conversation
Show compact assistant turn stats with model, elapsed time, token count, throughput, TTFT, and tool-call count when those fields are available. Derive Codex timing from response boundaries so completed throughput does not use tiny final delta gaps as the denominator. Closes pingdotgg#2518
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ApprovabilityVerdict: Needs human review This PR introduces a new user-facing feature (assistant turn stats footer) with new UI components, server-side timing state management, and non-trivial derivation logic. New features warrant human review, and there's also an unresolved comment about potential TTFT overwriting in the fallback path. You can customize Macroscope's approvability policy. Learn more. |
juliusmarminge
left a comment
There was a problem hiding this comment.
you aren't counting TPS here, you're taking the full turn duration / turn token count which doesn't take tool calls into account so the metric is stupid? forgie me if i misunderstand
Close Codex assistant response timing segments when tool work starts so fallback generation duration does not include command or file-change output time. Add regression coverage for a long command gap and clarify the TPS tooltip. Refs: pingdotgg#2518
|
Good catch. The UI was not directly dividing by the whole turn elapsed time, but the Codex fallback timing could still include tool execution wall time in the assistant generation duration. I pushed I also added a regression test for the specific failure mode: assistant text starts, a long command runs, then token usage is emitted. Before the fix, that case reported Verified locally:
The live smoke test with three separate commands and a longer report response showed: |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 89169c2. Configure here.
| ...(assistantTimeToFirstTokenMs !== undefined | ||
| ? { timeToFirstTokenMs: assistantTimeToFirstTokenMs } | ||
| : {}), | ||
| }; |
There was a problem hiding this comment.
Derived TTFT overwrites provider-supplied TTFT when duration missing
Low Severity
buildContextWindowActivityPayload computes providerTimeToFirstTokenMs from the usage payload but never guards against overwriting it in the fallback branch. When a provider supplies timeToFirstTokenMs without durationMs, the final spread { ...usage, ...(assistantTimeToFirstTokenMs !== undefined ? { timeToFirstTokenMs: assistantTimeToFirstTokenMs } : {}) } overwrites the provider's value with the derived one. The first branch correctly preserves provider values, but the else-branch does not check providerTimeToFirstTokenMs before merging.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 89169c2. Configure here.




Summary
Closes #2518.
Verification
cd apps/server && bun run test -- src/orchestration/Layers/ProviderRuntimeIngestion.test.tscd apps/server && bun run test -- src/orchestration/Layers/ProviderRuntimeIngestion.test.ts src/orchestration/Layers/ProjectionPipeline.test.tscd apps/web && bun run test -- src/lib/turnStats.test.ts src/lib/contextWindow.test.tsbun fmtbun lint(passes with existing warnings only)bun typecheckbun run testgit diff --checkLive smoke
http://localhost:5733/[data-assistant-turn-stats="true"]footer rendered with16 tokens,66.7 tok/sec, andTime-to-first: 7.2 secdurationMs: 240,timeToFirstTokenMs: 7216, andlastOutputTokens: 16, so completed TPS is not using the old sub-10ms final-delta denominatorNote
Medium Risk
Touches provider runtime ingestion, event schema, and thread projection state, which could affect persisted turn pointers and token-usage activity data if the derived timing logic or new
turnIdplumbing is incorrect. UI changes are low risk but depend on the new backend-projected metrics.Overview
Adds an assistant turn stats footer to assistant messages in the chat timeline, showing model/effort, elapsed time, output tokens, throughput, TTFT, and tool-call count only when those metrics are available.
On the backend, token-usage activities now support
timeToFirstTokenMsand Codex token-usage events carry aturnId/providerTurnId. When provider durations are missing, ingestion derives assistant-only generation duration and TTFT from Codex assistant stream boundaries while excluding tool/approval/user-input gaps, and persists these values so the footer survives reloads.Also fixes thread projection updates to preserve
latestTurnIdwhen athread.session-setclearsactiveTurnId, and updates proposed-plan exports to prefix generated filenames withplan-(without double-prefixing).Reviewed by Cursor Bugbot for commit 89169c2. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add assistant turn stats footer to chat messages with duration, throughput, and tool call metrics
TurnStatsFootercomponent that renders a compact stats row (model, elapsed time, tokens, throughput, TTFT, tool calls) beneath completed assistant messages in the chat timeline.turnStats.tswith utilities to derive, format, and assemble these metrics from thread activities and context-window snapshots, including fallback logic for unassigned snapshots and tool call counting.ProviderRuntimeIngestion.tsto track assistant response segment timing per turn, derivingdurationMsandtimeToFirstTokenMsand merging them intocontext-window.updatedactivities when provider data is absent.timeToFirstTokenMstoThreadTokenUsageSnapshotand propagatesturnId/providerTurnIdthroughthread.token-usage.updatedevents from the Codex adapter.thread.session-setprojection to retainlatestTurnIdwhen the incoming event has a nullactiveTurnId.Macroscope summarized 89169c2.