Summary
Add an MLX-based inference backend as an alternative to the existing llama.cpp path, targeting Apple Silicon Macs.
Problem
The current local inference path uses llama.cpp, which runs on the CPU (and partially on Metal via GGML). MLX is Apple's own machine learning framework optimized for Apple Silicon's unified memory architecture. On M-series chips, MLX can deliver significantly better throughput and lower latency than llama.cpp for the same model because it is designed ground-up for the hardware.
Users with Apple Silicon Macs would get faster completions and lower energy draw from local inference without switching to the Apple Intelligence engine.
Proposed direction
- Add a new
SuggestionEngineKind case (e.g. .llamaMLX or .mlx) alongside the existing .llamaOpenSource and .appleIntelligence cases.
- Implement an
MLXSuggestionEngine conforming to the existing SuggestionEngineProtocol contract in SuggestionSubsystemContracts.swift.
- Route through
SuggestionEngineRouter the same way the llama path does today.
- Support GGUF or MLX-native quantized weights (e.g. via
mlx-community HuggingFace models). Reuse the existing ModelDownloadManager / BundledRuntimeLocator where possible.
- Gate the engine option on Apple Silicon availability so it never appears on Intel Macs.
- Surface the new engine in the Engine picker in both the menu bar popup and Settings > Engine & Model.
Additional context
Summary
Add an MLX-based inference backend as an alternative to the existing llama.cpp path, targeting Apple Silicon Macs.
Problem
The current local inference path uses llama.cpp, which runs on the CPU (and partially on Metal via GGML). MLX is Apple's own machine learning framework optimized for Apple Silicon's unified memory architecture. On M-series chips, MLX can deliver significantly better throughput and lower latency than llama.cpp for the same model because it is designed ground-up for the hardware.
Users with Apple Silicon Macs would get faster completions and lower energy draw from local inference without switching to the Apple Intelligence engine.
Proposed direction
SuggestionEngineKindcase (e.g..llamaMLXor.mlx) alongside the existing.llamaOpenSourceand.appleIntelligencecases.MLXSuggestionEngineconforming to the existingSuggestionEngineProtocolcontract inSuggestionSubsystemContracts.swift.SuggestionEngineRouterthe same way the llama path does today.mlx-communityHuggingFace models). Reuse the existingModelDownloadManager/BundledRuntimeLocatorwhere possible.Additional context
LlamaRuntimeCore,LlamaRuntimeManager, andLlamaSuggestionEngineand is a good structural reference.CotabbyInferenceSwift package currently wraps llama.cpp; MLX could live in a new target in that package or in a dedicatedCotabbyMLXpackage.