[bug] Extended context may not be used with Apple Intelligence + nano models; surface context usage in Performance window

## Summary

When generating suggestions with the **Apple Intelligence** model and the **nano** model, the *extended* text context we build for the request does not appear to actually influence the completion. The suggestions behave as if only a small window of context is being considered, which suggests we may be **running out of context space** (the prompt is being truncated, or the model's context window is smaller than what we feed it).

## Observed behavior

- With Apple Intelligence and the nano model selected, completions seem to ignore text that lives further back in the focused field.
- Symptom is consistent enough to suspect the extended context is being silently dropped/truncated rather than genuinely consumed by the model.

## Hypothesis

- We may be assembling a prompt larger than the model's effective context window, so the head (or tail) of the extended context gets truncated before/at generation time.
- This could be model-specific (smaller context window on nano / Apple Intelligence on-device path) versus the llama.cpp path.

## Proposed outcome

Make context usage **visible and diagnosable** so we can confirm whether (and when) we're hitting the limit:

1. Surface context/token usage in the **Performance** window (`Cotabby/UI/Settings/Panes/PerformancePaneView.swift`). Today each `PerformanceMetricEntry` (`Cotabby/Models/PerformanceMetricsStore.swift`) records only `modelName`, `latencyMs`, and `isInitialLoad`. Add fields for prompt/context size (tokens or chars) and the model's context limit.
2. Add a clear **indicator when context is expended / truncated** — e.g. a flag or column in the Performance table (similar to the existing `*` initial-load marker) that fires when the assembled prompt exceeds the model's window and gets clipped.
3. Plumb the relevant numbers from the generation path (Apple Intelligence engine + llama runtime / `SuggestionEngineRouter`) into the metrics store so both backends report context usage consistently.

### Acceptance criteria

- The Performance pane shows, per request, how much context was sent vs. the model's limit.
- There is an unambiguous visual signal when a request's context was truncated/exceeded.
- We can use this to confirm whether the Apple Intelligence + nano "extended context ignored" behavior is caused by context exhaustion.

### Notes

Filed from local investigation. Reproduced informally with Apple Intelligence + nano model selected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] Extended context may not be used with Apple Intelligence + nano models; surface context usage in Performance window #486

Summary

Observed behavior

Hypothesis

Proposed outcome

Acceptance criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[bug] Extended context may not be used with Apple Intelligence + nano models; surface context usage in Performance window #486

Description

Summary

Observed behavior

Hypothesis

Proposed outcome

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions