[Feature] Hardware-aware model recommendation + optional model performance tracking

## Outcome
A user picking a local model can do it with confidence: Cotabby inspects their actual Mac (chip, total **and free** RAM, GPU/Metal budget), recommends the best-fit model, clearly warns when a model won't fit, and — optionally — measures real generation speed so the user can see the concrete performance impact of changing models. All local, all offline.

## Problem / Motivation
- Model selection today is raw Hugging Face browsing (`HuggingFaceSearchService`) with **no guidance on what fits or runs well**. A user can pick a model too large for their RAM (swap thrash / OOM / crawl) or needlessly small (leaving quality on the table).
- After switching models there's **no feedback** on whether it got faster or slower.

## Proposed Solution

### Part A — Hardware-aware recommendation
**Detection** (extend `Support/DeviceInfo.swift`, still side-effect-free):
- Chip class (Apple Silicon vs Intel), P/E core counts (`hw.perflevel0.physicalcpu`, `hw.perflevel1.physicalcpu`), GPU core count where available.
- **Free/available RAM**: `host_statistics64` + `vm_statistics64` (free + inactive + purgeable × page size) and memory-pressure state; account for unified memory on Apple Silicon.
- **Metal budget**: `MTLDevice.recommendedMaxWorkingSetSize` — the real GPU memory ceiling for the llama Metal backend.

**Fit logic** (new pure `Support/ModelFitEvaluator.swift` + `Models/CuratedModelCatalog.swift`):
- Each catalog entry carries an estimated footprint (params × quant bytes/param + KV-cache for the configured context).
- Classify per model for *this* host: **Recommended / Will run (tight) / Too large**, against free RAM and Metal budget with a headroom margin.
- Default recommendation = largest model that comfortably fits and clears a quality floor.

**UI** (`EngineAndModelPaneView`):
- Detected-hardware summary (chip, total + **free** RAM, GPU).
- Per-model badge: ✅ Recommended / ⚠️ Tight / ⛔ Won't fit, with a live free-RAM readout.
- Confirm/warn before downloading or loading an over-budget model.

### Part B — Model performance tracking (opt-in)
Builds on existing infra: `LLMIOFileHandler` already writes `latency_ms` + `request_id` to `llm-io.jsonl`, and `LlamaRuntimeCore` owns generation.
- **Per-generation metrics:** time-to-first-token (TTFT), total latency, decode tokens/sec, prompt token count, eval count, engine + model id + quant + context length.
- **Aggregation** (new `Services/Metrics/ModelPerformanceTracker.swift`, opt-in, local-only): rolling per-model median/p90 TTFT and tok/s + sample count; persisted locally (UserDefaults rolling aggregate now, or a future encrypted store).
- **UI:** a "Performance" section showing the current model's median TTFT & tok/s and a comparison vs the previously used model (e.g. *"Gemma 3 4B: 180 ms TTFT, 28 tok/s — 2.1× slower than Llama-3.2 1B"*).
- **Privacy:** strictly local, opt-in, wipeable, never networked.

## Acceptance Criteria
- [ ] `DeviceInfo` reports free + total RAM, chip class, core/GPU counts, and Metal budget.
- [ ] Model pane shows fit badges + live free RAM and warns/blocks on over-budget load.
- [ ] With tracking ON, after a few generations the pane shows TTFT and tok/s, and switching models shows a side-by-side comparison.
- [ ] Works fully offline; no new network calls; tracking is opt-in and clearable.

## Out of Scope
- Auto-downloading the recommended model without consent.
- Any cloud benchmark sharing / telemetry.

## Open Questions
- Footprint estimation: static per-model table vs measured-after-load calibration?
- Aggregate storage: UserDefaults vs a future encrypted local store?
- Should the recommendation factor thermal state / battery (on battery → bias smaller)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Hardware-aware model recommendation + optional model performance tracking #453

Outcome

Problem / Motivation

Proposed Solution

Part A — Hardware-aware recommendation

Part B — Model performance tracking (opt-in)

Acceptance Criteria

Out of Scope

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Hardware-aware model recommendation + optional model performance tracking #453

Description

Outcome

Problem / Motivation

Proposed Solution

Part A — Hardware-aware recommendation

Part B — Model performance tracking (opt-in)

Acceptance Criteria

Out of Scope

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions