Skip to content

[bug] Extended context may not be used with Apple Intelligence + nano models; surface context usage in Performance window #486

@jkrauska

Description

@jkrauska

Summary

When generating suggestions with the Apple Intelligence model and the nano model, the extended text context we build for the request does not appear to actually influence the completion. The suggestions behave as if only a small window of context is being considered, which suggests we may be running out of context space (the prompt is being truncated, or the model's context window is smaller than what we feed it).

Observed behavior

  • With Apple Intelligence and the nano model selected, completions seem to ignore text that lives further back in the focused field.
  • Symptom is consistent enough to suspect the extended context is being silently dropped/truncated rather than genuinely consumed by the model.

Hypothesis

  • We may be assembling a prompt larger than the model's effective context window, so the head (or tail) of the extended context gets truncated before/at generation time.
  • This could be model-specific (smaller context window on nano / Apple Intelligence on-device path) versus the llama.cpp path.

Proposed outcome

Make context usage visible and diagnosable so we can confirm whether (and when) we're hitting the limit:

  1. Surface context/token usage in the Performance window (Cotabby/UI/Settings/Panes/PerformancePaneView.swift). Today each PerformanceMetricEntry (Cotabby/Models/PerformanceMetricsStore.swift) records only modelName, latencyMs, and isInitialLoad. Add fields for prompt/context size (tokens or chars) and the model's context limit.
  2. Add a clear indicator when context is expended / truncated — e.g. a flag or column in the Performance table (similar to the existing * initial-load marker) that fires when the assembled prompt exceeds the model's window and gets clipped.
  3. Plumb the relevant numbers from the generation path (Apple Intelligence engine + llama runtime / SuggestionEngineRouter) into the metrics store so both backends report context usage consistently.

Acceptance criteria

  • The Performance pane shows, per request, how much context was sent vs. the model's limit.
  • There is an unambiguous visual signal when a request's context was truncated/exceeded.
  • We can use this to confirm whether the Apple Intelligence + nano "extended context ignored" behavior is caused by context exhaustion.

Notes

Filed from local investigation. Reproduced informally with Apple Intelligence + nano model selected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions