A Swift package that provides a drop-in replacement for Apple's Foundation Models framework with support for custom language model providers. All you need to do is change your import statement:
- import FoundationModels
+ import AnyLanguageModelstruct WeatherTool: Tool {
let name = "getWeather"
let description = "Retrieve the latest weather information for a city"
@Generable
struct Arguments {
@Guide(description: "The city to fetch the weather for")
var city: String
}
func call(arguments: Arguments) async throws -> String {
"The weather in \(arguments.city) is sunny and 72°F / 23°C"
}
}
let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
Prompt("How's the weather in Cupertino?")
}
print(response.content)To observe or control tool execution, assign a delegate on the session:
actor ToolExecutionObserver: ToolExecutionDelegate {
func didGenerateToolCalls(_ toolCalls: [Transcript.ToolCall], in session: LanguageModelSession) async {
print("Generated tool calls: \(toolCalls)")
}
func toolCallDecision(
for toolCall: Transcript.ToolCall,
in session: LanguageModelSession
) async -> ToolExecutionDecision {
// Return .stop to halt after tool calls, or .provideOutput(...) to bypass execution.
// This is a good place to ask the user for confirmation (for example, in a modal dialog).
.execute
}
func didExecuteToolCall(
_ toolCall: Transcript.ToolCall,
output: Transcript.ToolOutput,
in session: LanguageModelSession
) async {
print("Executed tool call: \(toolCall)")
}
}
let session = LanguageModelSession(model: model, tools: [WeatherTool()])
session.toolExecutionDelegate = ToolExecutionObserver()- Apple Foundation Models
- Core ML models
- MLX models
- llama.cpp (GGUF models)
- Ollama HTTP API
- Anthropic Messages API
- Google Gemini API
- OpenAI Chat Completions API
- OpenAI Responses API
- Open Responses (multi-provider Responses API–compatible endpoints)
- Swift 6.1+
- iOS 17.0+ / macOS 14.0+ / visionOS 1.0+ / Linux
Important
A bug in Xcode 26 may cause build errors
when targeting macOS 15 / iOS 18 or earlier
(e.g. Conformance of 'String' to 'Generable' is only available in macOS 26.0 or newer).
As a workaround, build your project with Xcode 16.
For more information, see issue #15.
Add this package to your Package.swift:
dependencies: [
.package(url: "https://github.com/mattt/AnyLanguageModel", from: "0.7.0")
]AnyLanguageModel uses Swift 6.1 traits to conditionally include heavy dependencies, allowing you to opt-in only to the language model backends you need. This results in smaller binary sizes and faster build times.
Available traits:
CoreML: Enables Core ML model support (depends onhuggingface/swift-transformers)MLX: Enables MLX model support (depends onml-explore/mlx-swift-lm)Llama: Enables llama.cpp support (requiresmattt/llama.swift)
By default, no traits are enabled. To enable specific traits, specify them in your package's dependencies:
// In your Package.swift
dependencies: [
.package(
url: "https://github.com/mattt/AnyLanguageModel.git",
from: "0.7.0",
traits: ["CoreML", "MLX"] // Enable CoreML and MLX support
)
]Important
Due to a Swift Package Manager bug, dependency resolution may fail when you enable traits, producing the error "exhausted attempts to resolve the dependencies graph." To work around this issue, add the underlying dependencies for each trait directly to your package:
dependencies: [
.package(
url: "https://github.com/mattt/AnyLanguageModel.git",
from: "0.7.0",
traits: ["CoreML", "MLX", "Llama"]
),
.package(url: "https://github.com/huggingface/swift-transformers", from: "1.0.0"), // CoreML
.package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.25.5"), // MLX
.package(url: "https://github.com/mattt/llama.swift", from: "2.0.0"), // Llama
]Include only the dependencies that correspond to the traits you enable. For more information, see issue #135.
Xcode doesn't yet provide a built-in way to declare package dependencies with traits.
As a workaround,
you can create an internal Swift package that acts as a shim,
exporting the AnyLanguageModel module with the desired traits enabled.
Your Xcode project can then add this internal package as a local dependency.
For example, to use AnyLanguageModel with MLX support in an Xcode app project:
1. Create a local Swift package (in root directory containing Xcode project):
mkdir -p Packages/MyAppKit
cd Packages/MyAppKit
swift package init2. Specify AnyLanguageModel package dependency
(in Packages/MyAppKit/Package.swift):
// swift-tools-version: 6.1
import PackageDescription
let package = Package(
name: "MyAppKit",
platforms: [
.macOS(.v14),
.iOS(.v17),
.visionOS(.v1),
],
products: [
.library(
name: "MyAppKit",
targets: ["MyAppKit"]
)
],
dependencies: [
.package(
url: "https://github.com/mattt/AnyLanguageModel",
from: "0.4.0",
traits: ["MLX"]
)
],
targets: [
.target(
name: "MyAppKit",
dependencies: [
.product(name: "AnyLanguageModel", package: "AnyLanguageModel")
]
)
]
)3. Export the AnyLanguageModel module
(in Sources/MyAppKit/Export.swift):
@_exported import AnyLanguageModel4. Add the local package to your Xcode project:
Open your project settings,
navigate to the "Package Dependencies" tab,
and click "+" → "Add Local..." to select the Packages/MyAppKit directory.
Your app can now import AnyLanguageModel with MLX support enabled.
Tip
For a working example of package traits in an Xcode app project, see chat-ui-swift.
When using third-party language model providers like OpenAI, Anthropic, or Google Gemini, you must handle API credentials securely.
Caution
Never hardcode API credentials in your app. Malicious actors can reverse‑engineer your application binary or observe outgoing network requests (for example, on a compromised device or via a debugging proxy) to extract embedded credentials. There have been documented cases of attackers successfully exfiltrating API keys from mobile apps and racking up thousands of dollars in charges.
Here are two approaches for managing API credentials in production apps:
Users provide their own API keys, which are stored securely in the system Keychain and sent directly to the provider in API requests.
Security considerations:
- Keychain data is encrypted using hardware-backed keys (protected by the Secure Enclave on supported devices)
- An attacker would need access to a running process to intercept credentials
- TLS encryption protects credentials in transit on the network
- Users can only compromise their own keys, not other users' keys
Trade-offs:
- Apple App Review has often rejected apps using this model
- Reviewers may be unable to test functionality — even with provided credentials
- Apple may require in-app purchase integration for usage credits
- Some users may find it inconvenient to obtain and enter API keys
Instead of connecting directly to the provider, route requests through your own authenticated service endpoint. API credentials are stored securely on your server, never in the client app.
Authenticate users with OAuth 2.1 or similar, issuing short-lived, scoped bearer tokens for client requests. If an attacker extracts tokens from your app, they're limited in scope and expire automatically.
Security considerations:
- API keys never leave your server infrastructure
- Client tokens can be scoped (e.g., rate-limited, feature-restricted)
- Client tokens can be revoked or expired independently
- Compromised tokens have limited blast radius
Trade-offs:
- Additional infrastructure complexity (server, authentication, monitoring)
- Operational costs (hosting, maintenance, support)
- Network latency from additional hop
Fortunately, there are platforms and services that simplify proxy implementation, handling authentication, rate limiting, and billing for you.
Tip
For development and testing, it's fine to use API keys from environment variables. Just make sure production builds use one of the secure approaches above.
For more information about security best practices for your app, see OWASP's Mobile Application Security Cheat Sheet.
All on-device models — Apple Foundation Models, Core ML, MLX, and llama.cpp —
support guided generation,
letting you request strongly typed outputs using @Generable and @Guide
instead of parsing raw strings.
Cloud providers (OpenAI, Open Responses, Anthropic, and Gemini)
also support guided generation.
For more details, see
Generating Swift data structures with guided generation.
@Generable(description: "Basic profile information about a cat")
struct CatProfile {
// A guide isn't necessary for basic fields.
var name: String
@Guide(description: "The age of the cat", .range(0...20))
var age: Int
@Guide(description: "A one sentence profile about the cat's personality")
var profile: String
}
let session = LanguageModelSession(model: model)
let response = try await session.respond(
to: "Generate a cute rescue cat",
generating: CatProfile.self
)
print(response.content)Many providers support image inputs,
letting you include images alongside text prompts.
Pass images using the images: or image: parameter on respond:
let response = try await session.respond(
to: "Describe what you see",
images: [
.init(url: URL(string: "https://example.com/photo.jpg")!),
.init(url: URL(fileURLWithPath: "/path/to/local.png"))
]
)Image support varies by provider:
| Provider | Image Inputs |
|---|---|
| Apple Foundation Models | — |
| Core ML | — |
| MLX | model-dependent |
| llama.cpp | — |
| Ollama | model-dependent |
| OpenAI | yes |
| Open Responses | yes |
| Anthropic | yes |
| Google Gemini | yes |
For MLX and Ollama,
use a vision-capable model
(for example, a VLM or -vl variant).
Tool calling is supported by all providers except llama.cpp.
Define tools using the Tool protocol and pass them when creating a session:
struct WeatherTool: Tool {
let name = "getWeather"
let description = "Retrieve the latest weather information for a city"
@Generable
struct Arguments {
@Guide(description: "The city to fetch the weather for")
var city: String
}
func call(arguments: Arguments) async throws -> String {
"The weather in \(arguments.city) is sunny and 72°F / 23°C"
}
}
let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
Prompt("How's the weather in Cupertino?")
}
print(response.content)To observe or control tool execution, assign a delegate on the session:
actor ToolExecutionObserver: ToolExecutionDelegate {
func didGenerateToolCalls(_ toolCalls: [Transcript.ToolCall], in session: LanguageModelSession) async {
print("Generated tool calls: \(toolCalls)")
}
func toolCallDecision(
for toolCall: Transcript.ToolCall,
in session: LanguageModelSession
) async -> ToolExecutionDecision {
// Return .stop to halt after tool calls, or .provideOutput(...) to bypass execution.
// This is a good place to ask the user for confirmation (for example, in a modal dialog).
.execute
}
func didExecuteToolCall(
_ toolCall: Transcript.ToolCall,
output: Transcript.ToolOutput,
in session: LanguageModelSession
) async {
print("Executed tool call: \(toolCall)")
}
}
session.toolExecutionDelegate = ToolExecutionObserver()Generate images from text prompts using the ImageGenerationModel protocol.
OpenAI and Google Gemini both offer image generation models:
let model = OpenAIImageGenerationModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-image-1"
)
let result = try await model.generateImages(
for: "A watercolor painting of a mountain lake at sunset",
options: ImageGenerationOptions(size: .landscape)
)
// Access the generated images
for image in result.images {
switch image.source {
case .data(let data, let mimeType):
// Use the image data
print("Generated \(mimeType) image: \(data.count) bytes")
case .url(let url):
print("Image URL: \(url)")
}
}Control generation with ImageGenerationOptions:
var options = ImageGenerationOptions(
numberOfImages: 2,
size: .square // .square, .landscape, .portrait, or .custom(width:height:)
)
// Set provider-specific options
options[custom: OpenAIImageGenerationModel.self] = .init(
quality: .high,
background: .transparent,
outputFormat: .png,
style: .vivid
)
let result = try await model.generateImages(for: "A company logo", options: options)Some models return a revised prompt describing how they interpreted your input:
if let revised = result.revisedPrompt {
print("Model interpreted prompt as: \(revised)")
}Image generation support varies by provider:
| Provider | Image Generation | Standalone Editing | Conversational Editing |
|---|---|---|---|
| OpenAI (gpt-image-1) | yes | yes | — |
| OpenAI (dall-e-3) | yes | — | — |
| Open Responses | yes | — | yes |
| Gemini Imagen | yes | — | — |
| Gemini Native | yes | yes | — |
| Gemini (conversation) | — | — | yes |
| xAI (grok-image-1) | yes | yes | — |
| Anthropic | — | — | — |
| Ollama | — | — | — |
| Apple Foundation Models | — | — | — |
Edit existing images by providing inputImages alongside a text prompt:
// OpenAI image editing
let model = OpenAIImageGenerationModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-image-1"
)
let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
for: "Remove the background and make it transparent",
options: ImageGenerationOptions(inputImages: [sourceImage])
)// Gemini Native image editing
let model = GeminiNativeImageGenerationModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash-image"
)
let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
for: "Change the art style to watercolor",
options: ImageGenerationOptions(inputImages: [sourceImage])
)// xAI image editing (uses OpenAIImageGenerationModel with xAI base URL)
let model = OpenAIImageGenerationModel(
baseURL: URL(string: "https://api.x.ai/v1/")!,
apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
model: "grok-image-1"
)
let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
for: "Remove the background from this photo",
options: ImageGenerationOptions(inputImages: [sourceImage])
)Image-capable language models can also edit images within a conversation,
enabling multi-turn editing workflows. Pass images using respond(to:images:options:)
and retrieve results from response.generatedImages:
// Gemini conversational image editing
let model = GeminiLanguageModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)
var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
imageGeneration: .init(outputMimeType: .png)
)
let response = try await session.respond(
to: "Remove the background from this photo",
images: [Transcript.ImageSegment(data: photoData, mimeType: "image/png")],
options: options
)
for image in response.generatedImages {
switch image.source {
case .data(let data, let mimeType):
print("Edited \(mimeType) image: \(data.count) bytes")
case .url(let url):
print("Image URL: \(url)")
}
}// OpenAI conversational image editing (via Responses API)
let model = OpenResponsesLanguageModel(
baseURL: URL(string: "https://api.openai.com/v1/")!,
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-4.1"
)
let session = LanguageModelSession(model: model)
var options = GenerationOptions()
options[custom: OpenResponsesLanguageModel.self] = .init(
imageGeneration: .init(quality: .high)
)
let response = try await session.respond(
to: "Make this photo look like a watercolor painting",
images: [Transcript.ImageSegment(data: photoData, mimeType: "image/png")],
options: options
)
for image in response.generatedImages {
switch image.source {
case .data(let data, let mimeType):
print("Edited \(mimeType) image: \(data.count) bytes")
case .url(let url):
print("Image URL: \(url)")
}
}Uses Apple's system language model (requires macOS 26 / iOS 26 / visionOS 26 or later).
let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model)
let response = try await session.respond {
Prompt("Explain quantum computing in one sentence")
}Runs Core ML models
(requires CoreML trait):
let model = CoreMLLanguageModel(url: URL(fileURLWithPath: "path/to/model.mlmodelc"))
let session = LanguageModelSession(model: model)
let response = try await session.respond {
Prompt("Summarize this text")
}Enable the trait in Package.swift:
.package(
url: "https://github.com/mattt/AnyLanguageModel.git",
branch: "main",
traits: ["CoreML"]
)Runs MLX models on Apple Silicon
(requires MLX trait):
let model = MLXLanguageModel(modelId: "mlx-community/Qwen3-0.6B-4bit")
let session = LanguageModelSession(model: model)
let response = try await session.respond {
Prompt("What is the capital of France?")
}Vision support depends on the specific MLX model you load. Use a vision‑capable model for multimodal prompts (for example, a VLM variant). The following shows extracting text from an image:
let ocr = try await session.respond(
to: "Extract the total amount from this receipt",
images: [
.init(url: URL(fileURLWithPath: "/path/to/receipt_page1.png")),
.init(url: URL(fileURLWithPath: "/path/to/receipt_page2.png"))
]
)
print(ocr.content)Enable the trait in Package.swift:
.package(
url: "https://github.com/mattt/AnyLanguageModel.git",
branch: "main",
traits: ["MLX"]
)Runs GGUF quantized models via llama.cpp
(requires Llama trait):
let model = LlamaLanguageModel(modelPath: "/path/to/model.gguf")
let session = LanguageModelSession(model: model)
let response = try await session.respond {
Prompt("Translate 'hello world' to Spanish")
}Enable the trait in Package.swift:
.package(
url: "https://github.com/mattt/AnyLanguageModel.git",
branch: "main",
traits: ["Llama"]
)Configuration is done via custom generation options, allowing you to control runtime parameters per request:
var options = GenerationOptions(temperature: 0.8)
options[custom: LlamaLanguageModel.self] = .init(
contextSize: 4096, // Context window size
batchSize: 512, // Batch size for evaluation
threads: 8, // Number of threads
seed: 42, // Random seed for deterministic output
temperature: 0.7, // Sampling temperature
topK: 40, // Top-K sampling
topP: 0.95, // Top-P (nucleus) sampling
repeatPenalty: 1.2, // Penalty for repeated tokens
repeatLastN: 128, // Number of tokens to consider for repeat penalty
frequencyPenalty: 0.1, // Frequency-based penalty
presencePenalty: 0.1, // Presence-based penalty
mirostat: .v2(tau: 5.0, eta: 0.1) // Adaptive perplexity control
)
let response = try await session.respond(
to: "Write a story",
options: options
)Run models locally via Ollama's HTTP API:
// Default: connects to http://localhost:11434
let model = OllamaLanguageModel(model: "qwen3") // `ollama pull qwen3:8b`
// Custom endpoint
let model = OllamaLanguageModel(
endpoint: URL(string: "http://remote-server:11434")!,
model: "llama3.2"
)
let session = LanguageModelSession(model: model)
let response = try await session.respond {
Prompt("Tell me a joke")
}For local models, make sure you're using a vision‑capable model
(for example, a -vl variant).
You can combine multiple images:
let model = OllamaLanguageModel(model: "qwen3-vl") // `ollama pull qwen3-vl:8b`
let session = LanguageModelSession(model: model)
let response = try await session.respond(
to: "Compare these posters and summarize their differences",
images: [
.init(url: URL(string: "https://example.com/poster1.jpg")!),
.init(url: URL(fileURLWithPath: "/path/to/poster2.jpg"))
]
)
print(response.content)Pass any model-specific parameters using custom generation options:
var options = GenerationOptions(temperature: 0.8)
options[custom: OllamaLanguageModel.self] = [
"seed": .int(42),
"repeat_penalty": .double(1.2),
"num_ctx": .int(4096),
"stop": .array([.string("###")])
]Supports both Chat Completions and Responses APIs:
let model = OpenAILanguageModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-4o-mini"
)
let session = LanguageModelSession(model: model)
let response = try await session.respond(
to: "List the objects you see",
images: [
.init(url: URL(string: "https://example.com/desk.jpg")!),
.init(
data: try Data(contentsOf: URL(fileURLWithPath: "/path/to/closeup.png")),
mimeType: "image/png"
)
]
)
print(response.content)For OpenAI-compatible endpoints that use older Chat Completions API:
let model = OpenAILanguageModel(
baseURL: URL(string: "https://api.example.com")!,
apiKey: apiKey,
model: "gpt-4o-mini",
apiVariant: .chatCompletions
)Use custom generation options for advanced parameters like sampling controls, reasoning effort (for o-series models), and vendor-specific extensions:
var options = GenerationOptions(temperature: 0.8)
options[custom: OpenAILanguageModel.self] = .init(
topP: 0.9,
frequencyPenalty: 0.5,
presencePenalty: 0.3,
stopSequences: ["END"],
reasoningEffort: .high, // For reasoning models (o3, o4-mini)
serviceTier: .priority,
extraBody: [ // Vendor-specific parameters
"custom_param": .string("value")
]
)Generate images using OpenAI's Images API with DALL-E or GPT Image models:
let imageModel = OpenAIImageGenerationModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-image-1" // or "dall-e-3"
)
let result = try await imageModel.generateImages(
for: "A cute robot waving hello",
options: ImageGenerationOptions(numberOfImages: 1, size: .square)
)OpenAI-specific options include quality, background transparency, output format, and style:
var options = ImageGenerationOptions(size: .landscape)
options[custom: OpenAIImageGenerationModel.self] = .init(
quality: .high, // .low, .medium, .high
background: .transparent, // .opaque, .transparent
outputFormat: .png, // .png, .jpeg, .webp
style: .vivid // .natural, .vivid (DALL-E 3 only)
)
let result = try await imageModel.generateImages(
for: "A minimalist logo for a coffee shop",
options: options
)Since OpenAIImageGenerationModel accepts a custom base URL,
it also works with OpenAI-compatible image generation APIs like
xAI's Grok.
Use extraBody to pass vendor-specific parameters:
let grokModel = OpenAIImageGenerationModel(
baseURL: URL(string: "https://api.x.ai/v1/")!,
apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
model: "grok-imagine-image"
)
var options = ImageGenerationOptions(numberOfImages: 2)
options[custom: OpenAIImageGenerationModel.self] = .init(
extraBody: [
"aspect_ratio": .string("16:9"),
"resolution": .string("2k")
]
)
let result = try await grokModel.generateImages(
for: "A futuristic cityscape at night",
options: options
)Edit images using OpenAI's
Images edits API
by providing inputImages:
let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
var options = ImageGenerationOptions(inputImages: [sourceImage])
options[custom: OpenAIImageGenerationModel.self] = .init(
quality: .high
)
let result = try await imageModel.generateImages(
for: "Remove the background",
options: options
)For inpainting, provide a mask where transparent areas indicate where to edit:
var options = ImageGenerationOptions(inputImages: [sourceImage])
options[custom: OpenAIImageGenerationModel.self] = .init(
mask: Transcript.ImageSegment(data: maskData, mimeType: "image/png"),
inputFidelity: .high // .high, .low
)
let result = try await imageModel.generateImages(
for: "Replace the sky with a sunset",
options: options
)Connects to any API that conforms to the Open Responses specification (e.g. OpenAI, OpenRouter, or other compatible providers). Base URL is required—use your provider’s endpoint:
// Example: OpenRouter (https://openrouter.ai/api/v1/)
let model = OpenResponsesLanguageModel(
baseURL: URL(string: "https://openrouter.ai/api/v1/")!,
apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!,
model: "openai/gpt-4o-mini"
)
// Example: OpenAI
let model = OpenResponsesLanguageModel(
baseURL: URL(string: "https://api.openai.com/v1/")!,
apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!,
model: "gpt-4o-mini"
)
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Say hello")Custom options support Open Responses–specific fields,
such as tool_choice (including allowed_tools) and extraBody:
var options = GenerationOptions(temperature: 0.8)
options[custom: OpenResponsesLanguageModel.self] = .init(
toolChoice: .auto,
allowedTools: ["getWeather"],
reasoningEffort: .high,
extraBody: ["custom_param": .string("value")]
)The Responses API supports image generation as a built-in server tool.
Unlike the standalone Images API (used by OpenAIImageGenerationModel),
images are generated as part of a conversation — enabling multi-turn editing
and images alongside text responses:
let model = OpenResponsesLanguageModel(
baseURL: URL(string: "https://api.openai.com/v1/")!,
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-4.1"
)
let session = LanguageModelSession(model: model)
var options = GenerationOptions()
options[custom: OpenResponsesLanguageModel.self] = .init(
imageGeneration: .init(quality: .high, size: .square)
)
let response = try await session.respond(
to: "Generate an image of a cat wearing a top hat",
options: options
)
// Access generated images from the response
for image in response.generatedImages {
switch image.source {
case .data(let data, let mimeType):
print("Got \(mimeType) image: \(data.count) bytes")
case .url(let url):
print("Image URL: \(url)")
}
}Configure image generation parameters through the tool options:
options[custom: OpenResponsesLanguageModel.self] = .init(
imageGeneration: .init(
quality: .high, // .low, .medium, .high, .auto
size: .landscape, // .square, .landscape, .portrait, .auto
background: .transparent, // .opaque, .transparent, .auto
outputFormat: .png, // .png, .jpeg, .webp
outputCompression: 80 // 0–100 (for lossy formats)
)
)Since image generation is a server tool, it works alongside regular text responses and function tools in the same request. The model decides when to generate images based on the conversation.
Uses the Messages API with Claude models:
let model = AnthropicLanguageModel(
apiKey: ProcessInfo.processInfo.environment["ANTHROPIC_API_KEY"]!,
model: "claude-sonnet-4-5-20250929"
)
let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
Prompt("What's the weather like in San Francisco?")
}You can include images with your prompt. You can point to remote URLs or construct from image data:
let response = try await session.respond(
to: "Explain the key parts of this diagram",
image: .init(
data: try Data(contentsOf: URL(fileURLWithPath: "/path/to/diagram.png")),
mimeType: "image/png"
)
)
print(response.content)Use custom generation options for Anthropic-specific parameters like extended thinking, tool choice control, and sampling parameters:
var options = GenerationOptions(temperature: 0.7)
options[custom: AnthropicLanguageModel.self] = .init(
topP: 0.9,
topK: 40,
stopSequences: ["END", "STOP"],
thinking: .init(budgetTokens: 4096), // Extended thinking
toolChoice: .auto, // Tool selection control
serviceTier: .priority
)Uses the Gemini API with Gemini models:
let model = GeminiLanguageModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash"
)
let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
Prompt("What's the weather like in Tokyo?")
}Send images with your prompt using remote or local sources:
let response = try await session.respond(
to: "Identify the plants in this photo",
image: .init(url: URL(string: "https://example.com/garden.jpg")!)
)
print(response.content)Gemini models use an internal "thinking process" that improves reasoning and multi-step planning. Configure thinking mode through custom generation options:
var options = GenerationOptions()
// Enable thinking with dynamic budget allocation
options[custom: GeminiLanguageModel.self] = .init(thinking: .dynamic)
// Or set an explicit number of tokens for its thinking budget
options[custom: GeminiLanguageModel.self] = .init(thinking: .budget(1024))
// Disable thinking (default)
options[custom: GeminiLanguageModel.self] = .init(thinking: .disabled)
let response = try await session.respond(to: "Solve this problem", options: options)Gemini supports server-side tools that execute transparently on Google's infrastructure:
var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
serverTools: [
.googleSearch,
.googleMaps(latitude: 35.6580, longitude: 139.7016)
]
)
let response = try await session.respond(
to: "What coffee shops are nearby?",
options: options
)Available server tools:
.googleSearchGrounds responses with real-time web information.googleMapsProvides location-aware responses.codeExecutionGenerates and runs Python code to solve problems.urlContextFetches and analyzes content from URLs mentioned in prompts
Tip
Gemini server tools are not available as client tools (Tool) for other models.
Gemini supports image generation through two approaches:
Imagen API — Uses Google's dedicated image generation models via the predict endpoint:
let imagenModel = GeminiImagenModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "imagen-4.0-generate-001"
)
let result = try await imagenModel.generateImages(
for: "A photorealistic landscape of rolling hills at golden hour",
options: ImageGenerationOptions(numberOfImages: 2, size: .landscape)
)Imagen-specific options include output format, safety filters, person generation controls, and negative prompts:
var options = ImageGenerationOptions(size: .square)
options[custom: GeminiImagenModel.self] = .init(
outputMimeType: .jpeg, // .jpeg, .png
safetyFilterLevel: .blockMediumAndAbove, // Safety filter threshold
personGeneration: .allowAdult, // .dontAllow, .allowAdult, .allowAll
negativePrompt: "blurry, low quality" // What to exclude
)
let result = try await imagenModel.generateImages(
for: "A professional headshot portrait",
options: options
)Native Gemini — Uses the generateContent API with image output modalities.
Models like gemini-2.5-flash-image and gemini-3-pro-image-preview can generate
images alongside text:
let nativeModel = GeminiNativeImageGenerationModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash-image"
)
let result = try await nativeModel.generateImages(
for: "Draw a cartoon cat wearing a top hat",
options: ImageGenerationOptions()
)
// Native models may return text alongside the image
if let text = result.revisedPrompt {
print("Model said: \(text)")
}Configure aspect ratio, resolution, and output format through custom options:
var options = ImageGenerationOptions()
options[custom: GeminiNativeImageGenerationModel.self] = .init(
aspectRatio: .widescreen, // .square, .standard, .standardPortrait, .widescreen, .widescreenPortrait
imageSize: .hd // .standard (1K), .hd (2K), .ultraHD (4K, gemini-3-pro only)
)
let result = try await nativeModel.generateImages(
for: "A panoramic mountain landscape at sunset",
options: options
)You can also use the standard ImageGenerationOptions.size property, which maps to
aspect ratio automatically (.square → 1:1, .landscape → 16:9, .portrait → 9:16).
Inline Image Generation in Conversations — Image-capable Gemini models can also
generate images directly within a LanguageModelSession conversation, enabling
multi-turn image editing and mixed text-and-image responses:
let model = GeminiLanguageModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)
var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
imageGeneration: .init(aspectRatio: .widescreen)
)
let response = try await session.respond(
to: "Draw a cartoon cat wearing a top hat",
options: options
)
for image in response.generatedImages {
switch image.source {
case .data(let data, let mimeType):
print("Got \(mimeType) image: \(data.count) bytes")
case .url(let url):
print("Image URL: \(url)")
}
}
// Multi-turn editing works automatically —
// thought signatures are preserved across turns
let edited = try await session.respond(
to: "Now change the hat to a beret",
options: options
)Note
Streaming (streamResponse) does not currently surface generated images.
Nano Banana (Gemini native image generation) —
Google's Nano Banana
models generate and edit images natively within conversations.
Use gemini-2.5-flash-image (Nano Banana), gemini-3-pro-image-preview
(Nano Banana Pro), or gemini-3.1-flash-image-preview (Nano Banana 2)
with either the GeminiLanguageModel for multi-turn conversations or
GeminiNativeImageGenerationModel for standalone generation:
// Multi-turn conversation with Nano Banana
let model = GeminiLanguageModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)
var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
imageGeneration: .init(
aspectRatio: .widescreen, // 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
imageSize: .hd // .small (512px), .standard (1K), .hd (2K), .ultraHD (4K)
)
)
let response = try await session.respond(
to: "Draw a cartoon cat wearing a top hat",
options: options
)
for image in response.generatedImages {
// Use image.source to access .data(Data, String) or .url(URL)
}
// Multi-turn editing — the model remembers the previous image
let edited = try await session.respond(
to: "Now change the hat to a beret",
options: options
)Nano Banana Pro (gemini-3-pro-image-preview) supports 4K resolution.
Nano Banana 2 (gemini-3.1-flash-image-preview) is the latest Flash-based
image model. Thought signatures are preserved automatically across turns
so the model can reference and edit previous images.
Native Gemini also supports image editing by passing inputImages:
let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await nativeModel.generateImages(
for: "Change the art style to watercolor",
options: ImageGenerationOptions(inputImages: [sourceImage])
)Generate videos from text prompts using the VideoGenerationModel protocol.
OpenAI (Sora), xAI (Grok), and Google Gemini (Veo) all offer video generation models.
All three use an asynchronous pattern — a job is created, polled until complete,
and then the video is downloaded:
let model = OpenAIVideoGenerationModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "sora-2"
)
let result = try await model.generateVideo(
for: "A drone shot of a sunset over the ocean",
options: VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8)
)
// Access the generated videos
for video in result.videos {
switch video.source {
case .data(let data, let mimeType):
print("Generated \(mimeType) video: \(data.count) bytes")
case .url(let url):
print("Video URL: \(url)")
}
}Control generation with VideoGenerationOptions:
var options = VideoGenerationOptions(
aspectRatio: .landscape, // .square, .landscape, .portrait
durationSeconds: 8 // Snapped to nearest valid value per provider
)
// Set provider-specific options
options[custom: OpenAIVideoGenerationModel.self] = .init(
size: "1920x1080"
)
let result = try await model.generateVideo(for: "A timelapse of clouds", options: options)Video generation support by provider:
| Provider | Model Names | Aspect Ratios | Duration (seconds) | Custom Options |
|---|---|---|---|---|
| OpenAI (Sora) | sora-2, sora-2-pro |
1:1, 16:9, 9:16 | 4, 8, 12 | size, extraBody |
| xAI (Grok) | grok-imagine-video |
1:1, 16:9, 9:16 | 1–15 | resolution (480p, 720p), extraBody |
| Gemini (Veo) | veo-3.1-generate-preview, etc |
16:9, 9:16 | 4, 6, 8 | resolution, negativePrompt, personGeneration |
let model = OpenAIVideoGenerationModel(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "sora-2" // or "sora-2-pro"
)
var options = VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8) // Valid: 4, 8, 12
options[custom: OpenAIVideoGenerationModel.self] = .init(
size: "1280x720", // Explicit size (overrides aspect ratio)
pollInterval: 15 // Custom polling interval in seconds
)
let result = try await model.generateVideo(
for: "A cinematic sunrise over a mountain range",
options: options
)let model = XAIVideoGenerationModel(
apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
model: "grok-imagine-video"
)
var options = VideoGenerationOptions(aspectRatio: .portrait, durationSeconds: 5)
options[custom: XAIVideoGenerationModel.self] = .init(
resolution: ._720p // ._480p, ._720p
)
let result = try await model.generateVideo(
for: "A cat playing with a ball of yarn",
options: options
)let model = GeminiVideoGenerationModel(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "veo-3.1-generate-preview" // or "veo-2.0-generate-001"
)
var options = VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8)
options[custom: GeminiVideoGenerationModel.self] = .init(
resolution: ._1080p, // ._720p, ._1080p, ._4k
negativePrompt: "blurry, low quality", // What to exclude
personGeneration: .allowAdult // .dontAllow, .allowAdult, .allowAll
)
let result = try await model.generateVideo(
for: "A timelapse of clouds moving over a mountain",
options: options
)Run the test suite to verify everything works correctly:
swift testTests for different language model backends have varying requirements:
| Backend | Traits | Environment Variables |
|---|---|---|
| CoreML | CoreML |
HF_TOKEN |
| MLX | MLX |
HF_TOKEN |
| Llama | Llama |
LLAMA_MODEL_PATH |
| Anthropic | — | ANTHROPIC_API_KEY |
| OpenAI | — | OPENAI_API_KEY |
| Open Responses | — | OPEN_RESPONSES_API_KEY, OPEN_RESPONSES_BASE_URL |
| Ollama | — | — |
Example setup for running multiple tests at once:
export HF_TOKEN=your_huggingface_token
export LLAMA_MODEL_PATH=/path/to/model.gguf
export ANTHROPIC_API_KEY=your_anthropic_key
export OPENAI_API_KEY=your_openai_key
export OPEN_RESPONSES_API_KEY=your_open_responses_key
export OPEN_RESPONSES_BASE_URL=https://api.openai.com/v1/
swift test --traits CoreML,LlamaTip
Tests that perform generation are skipped in CI environments (when CI is set).
Override this by setting ENABLE_COREML_TESTS=1 or ENABLE_MLX_TESTS=1.
Note
MLX tests must be run with xcodebuild rather than swift test
due to Metal library loading requirements.
Since xcodebuild doesn't support package traits directly,
you'll first need to update Package.swift to enable the MLX trait by default.
- .default(enabledTraits: []),
+ .default(enabledTraits: ["MLX"]),Pass environment variables with TEST_RUNNER_ prefix:
export TEST_RUNNER_HF_TOKEN=your_huggingface_token
xcodebuild test \
-scheme AnyLanguageModel \
-destination 'platform=macOS' \
-only-testing:AnyLanguageModelTests/MLXLanguageModelTestsThis project is available under the MIT license. See the LICENSE file for more info.