AnyLanguageModel

A Swift package that provides a drop-in replacement for Apple's Foundation Models framework with support for custom language model providers. All you need to do is change your import statement:

- import FoundationModels
+ import AnyLanguageModel

struct WeatherTool: Tool {
    let name = "getWeather"
    let description = "Retrieve the latest weather information for a city"

    @Generable
    struct Arguments {
        @Guide(description: "The city to fetch the weather for")
        var city: String
    }

    func call(arguments: Arguments) async throws -> String {
        "The weather in \(arguments.city) is sunny and 72°F / 23°C"
    }
}

let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model, tools: [WeatherTool()])

let response = try await session.respond {
    Prompt("How's the weather in Cupertino?")
}
print(response.content)

To observe or control tool execution, assign a delegate on the session:

actor ToolExecutionObserver: ToolExecutionDelegate {
    func didGenerateToolCalls(_ toolCalls: [Transcript.ToolCall], in session: LanguageModelSession) async {
        print("Generated tool calls: \(toolCalls)")
    }

    func toolCallDecision(
        for toolCall: Transcript.ToolCall,
        in session: LanguageModelSession
    ) async -> ToolExecutionDecision {
        // Return .stop to halt after tool calls, or .provideOutput(...) to bypass execution.
        // This is a good place to ask the user for confirmation (for example, in a modal dialog).
        .execute
    }

    func didExecuteToolCall(
        _ toolCall: Transcript.ToolCall,
        output: Transcript.ToolOutput,
        in session: LanguageModelSession
    ) async {
        print("Executed tool call: \(toolCall)")
    }
}

let session = LanguageModelSession(model: model, tools: [WeatherTool()])
session.toolExecutionDelegate = ToolExecutionObserver()

Features

Supported Providers

Requirements

Swift 6.1+
iOS 17.0+ / macOS 14.0+ / visionOS 1.0+ / Linux

Important

A bug in Xcode 26 may cause build errors when targeting macOS 15 / iOS 18 or earlier (e.g. Conformance of 'String' to 'Generable' is only available in macOS 26.0 or newer). As a workaround, build your project with Xcode 16. For more information, see issue #15.

Installation

Add this package to your Package.swift:

dependencies: [
    .package(url: "https://github.com/mattt/AnyLanguageModel", from: "0.7.0")
]

Package Traits

AnyLanguageModel uses Swift 6.1 traits to conditionally include heavy dependencies, allowing you to opt-in only to the language model backends you need. This results in smaller binary sizes and faster build times.

Available traits:

CoreML: Enables Core ML model support (depends on huggingface/swift-transformers)
MLX: Enables MLX model support (depends on ml-explore/mlx-swift-lm)
Llama: Enables llama.cpp support (requires mattt/llama.swift)

By default, no traits are enabled. To enable specific traits, specify them in your package's dependencies:

// In your Package.swift
dependencies: [
    .package(
        url: "https://github.com/mattt/AnyLanguageModel.git",
        from: "0.7.0",
        traits: ["CoreML", "MLX"] // Enable CoreML and MLX support
    )
]

Important

Due to a Swift Package Manager bug, dependency resolution may fail when you enable traits, producing the error "exhausted attempts to resolve the dependencies graph." To work around this issue, add the underlying dependencies for each trait directly to your package:

dependencies: [
    .package(
        url: "https://github.com/mattt/AnyLanguageModel.git",
        from: "0.7.0",
        traits: ["CoreML", "MLX", "Llama"]
    ),
    .package(url: "https://github.com/huggingface/swift-transformers", from: "1.0.0"), // CoreML
    .package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.25.5"),       // MLX
    .package(url: "https://github.com/mattt/llama.swift", from: "2.0.0"),              // Llama
]

Include only the dependencies that correspond to the traits you enable. For more information, see issue #135.

Using Traits in Xcode Projects

Xcode doesn't yet provide a built-in way to declare package dependencies with traits. As a workaround, you can create an internal Swift package that acts as a shim, exporting the AnyLanguageModel module with the desired traits enabled. Your Xcode project can then add this internal package as a local dependency.

For example, to use AnyLanguageModel with MLX support in an Xcode app project:

1. Create a local Swift package (in root directory containing Xcode project):

mkdir -p Packages/MyAppKit
cd Packages/MyAppKit
swift package init

2. Specify AnyLanguageModel package dependency (in Packages/MyAppKit/Package.swift):

// swift-tools-version: 6.1
import PackageDescription

let package = Package(
    name: "MyAppKit",
    platforms: [
        .macOS(.v14),
        .iOS(.v17),
        .visionOS(.v1),
    ],
    products: [
        .library(
            name: "MyAppKit",
            targets: ["MyAppKit"]
        )
    ],
    dependencies: [
        .package(
            url: "https://github.com/mattt/AnyLanguageModel",
            from: "0.4.0",
            traits: ["MLX"]
        )
    ],
    targets: [
        .target(
            name: "MyAppKit",
            dependencies: [
                .product(name: "AnyLanguageModel", package: "AnyLanguageModel")
            ]
        )
    ]
)

3. Export the AnyLanguageModel module (in Sources/MyAppKit/Export.swift):

@_exported import AnyLanguageModel

4. Add the local package to your Xcode project:

Open your project settings, navigate to the "Package Dependencies" tab, and click "+" → "Add Local..." to select the Packages/MyAppKit directory.

Your app can now import AnyLanguageModel with MLX support enabled.

Tip

For a working example of package traits in an Xcode app project, see chat-ui-swift.

API Credentials and Security

When using third-party language model providers like OpenAI, Anthropic, or Google Gemini, you must handle API credentials securely.

Caution

Never hardcode API credentials in your app. Malicious actors can reverse‑engineer your application binary or observe outgoing network requests (for example, on a compromised device or via a debugging proxy) to extract embedded credentials. There have been documented cases of attackers successfully exfiltrating API keys from mobile apps and racking up thousands of dollars in charges.

Here are two approaches for managing API credentials in production apps:

Bring Your Own Key (BYO)

Users provide their own API keys, which are stored securely in the system Keychain and sent directly to the provider in API requests.

Security considerations:

Keychain data is encrypted using hardware-backed keys (protected by the Secure Enclave on supported devices)
An attacker would need access to a running process to intercept credentials
TLS encryption protects credentials in transit on the network
Users can only compromise their own keys, not other users' keys

Trade-offs:

Apple App Review has often rejected apps using this model
Reviewers may be unable to test functionality — even with provided credentials
Apple may require in-app purchase integration for usage credits
Some users may find it inconvenient to obtain and enter API keys

Proxy Server

Instead of connecting directly to the provider, route requests through your own authenticated service endpoint. API credentials are stored securely on your server, never in the client app.

Authenticate users with OAuth 2.1 or similar, issuing short-lived, scoped bearer tokens for client requests. If an attacker extracts tokens from your app, they're limited in scope and expire automatically.

Security considerations:

API keys never leave your server infrastructure
Client tokens can be scoped (e.g., rate-limited, feature-restricted)
Client tokens can be revoked or expired independently
Compromised tokens have limited blast radius

Trade-offs:

Additional infrastructure complexity (server, authentication, monitoring)
Operational costs (hosting, maintenance, support)
Network latency from additional hop

Fortunately, there are platforms and services that simplify proxy implementation, handling authentication, rate limiting, and billing for you.

Tip

For development and testing, it's fine to use API keys from environment variables. Just make sure production builds use one of the secure approaches above.

For more information about security best practices for your app, see OWASP's Mobile Application Security Cheat Sheet.

Usage

Guided Generation

All on-device models — Apple Foundation Models, Core ML, MLX, and llama.cpp — support guided generation, letting you request strongly typed outputs using @Generable and @Guide instead of parsing raw strings. Cloud providers (OpenAI, Open Responses, Anthropic, and Gemini) also support guided generation. For more details, see Generating Swift data structures with guided generation.

@Generable(description: "Basic profile information about a cat")
struct CatProfile {
    // A guide isn't necessary for basic fields.
    var name: String

    @Guide(description: "The age of the cat", .range(0...20))
    var age: Int

    @Guide(description: "A one sentence profile about the cat's personality")
    var profile: String
}

let session = LanguageModelSession(model: model)
let response = try await session.respond(
    to: "Generate a cute rescue cat",
    generating: CatProfile.self
)
print(response.content)

Image Inputs

Many providers support image inputs, letting you include images alongside text prompts. Pass images using the images: or image: parameter on respond:

let response = try await session.respond(
    to: "Describe what you see",
    images: [
        .init(url: URL(string: "https://example.com/photo.jpg")!),
        .init(url: URL(fileURLWithPath: "/path/to/local.png"))
    ]
)

Image support varies by provider:

Provider	Image Inputs
Apple Foundation Models	—
Core ML	—
MLX	model-dependent
llama.cpp	—
Ollama	model-dependent
OpenAI	yes
Open Responses	yes
Anthropic	yes
Google Gemini	yes

For MLX and Ollama, use a vision-capable model (for example, a VLM or -vl variant).

Tool Calling

Tool calling is supported by all providers except llama.cpp. Define tools using the Tool protocol and pass them when creating a session:

struct WeatherTool: Tool {
    let name = "getWeather"
    let description = "Retrieve the latest weather information for a city"

    @Generable
    struct Arguments {
        @Guide(description: "The city to fetch the weather for")
        var city: String
    }

    func call(arguments: Arguments) async throws -> String {
        "The weather in \(arguments.city) is sunny and 72°F / 23°C"
    }
}

let session = LanguageModelSession(model: model, tools: [WeatherTool()])

let response = try await session.respond {
    Prompt("How's the weather in Cupertino?")
}
print(response.content)

To observe or control tool execution, assign a delegate on the session:

actor ToolExecutionObserver: ToolExecutionDelegate {
    func didGenerateToolCalls(_ toolCalls: [Transcript.ToolCall], in session: LanguageModelSession) async {
        print("Generated tool calls: \(toolCalls)")
    }

    func toolCallDecision(
        for toolCall: Transcript.ToolCall,
        in session: LanguageModelSession
    ) async -> ToolExecutionDecision {
        // Return .stop to halt after tool calls, or .provideOutput(...) to bypass execution.
        // This is a good place to ask the user for confirmation (for example, in a modal dialog).
        .execute
    }

    func didExecuteToolCall(
        _ toolCall: Transcript.ToolCall,
        output: Transcript.ToolOutput,
        in session: LanguageModelSession
    ) async {
        print("Executed tool call: \(toolCall)")
    }
}

session.toolExecutionDelegate = ToolExecutionObserver()

Image Generation

Generate images from text prompts using the ImageGenerationModel protocol. OpenAI and Google Gemini both offer image generation models:

let model = OpenAIImageGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-image-1"
)

let result = try await model.generateImages(
    for: "A watercolor painting of a mountain lake at sunset",
    options: ImageGenerationOptions(size: .landscape)
)

// Access the generated images
for image in result.images {
    switch image.source {
    case .data(let data, let mimeType):
        // Use the image data
        print("Generated \(mimeType) image: \(data.count) bytes")
    case .url(let url):
        print("Image URL: \(url)")
    }
}

Control generation with ImageGenerationOptions:

var options = ImageGenerationOptions(
    numberOfImages: 2,
    size: .square  // .square, .landscape, .portrait, or .custom(width:height:)
)

// Set provider-specific options
options[custom: OpenAIImageGenerationModel.self] = .init(
    quality: .high,
    background: .transparent,
    outputFormat: .png,
    style: .vivid
)

let result = try await model.generateImages(for: "A company logo", options: options)

Some models return a revised prompt describing how they interpreted your input:

if let revised = result.revisedPrompt {
    print("Model interpreted prompt as: \(revised)")
}

Image generation support varies by provider:

Provider	Image Generation	Standalone Editing	Conversational Editing
OpenAI (gpt-image-1)	yes	yes	—
OpenAI (dall-e-3)	yes	—	—
Open Responses	yes	—	yes
Gemini Imagen	yes	—	—
Gemini Native	yes	yes	—
Gemini (conversation)	—	—	yes
xAI (grok-image-1)	yes	yes	—
Anthropic	—	—	—
Ollama	—	—	—
Apple Foundation Models	—	—	—

Image Editing

Edit existing images by providing inputImages alongside a text prompt:

// OpenAI image editing
let model = OpenAIImageGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-image-1"
)

let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
    for: "Remove the background and make it transparent",
    options: ImageGenerationOptions(inputImages: [sourceImage])
)

// Gemini Native image editing
let model = GeminiNativeImageGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash-image"
)

let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
    for: "Change the art style to watercolor",
    options: ImageGenerationOptions(inputImages: [sourceImage])
)

// xAI image editing (uses OpenAIImageGenerationModel with xAI base URL)
let model = OpenAIImageGenerationModel(
    baseURL: URL(string: "https://api.x.ai/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
    model: "grok-image-1"
)

let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")
let result = try await model.generateImages(
    for: "Remove the background from this photo",
    options: ImageGenerationOptions(inputImages: [sourceImage])
)

Conversational Image Editing

Image-capable language models can also edit images within a conversation, enabling multi-turn editing workflows. Pass images using respond(to:images:options:) and retrieve results from response.generatedImages:

// Gemini conversational image editing
let model = GeminiLanguageModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)

var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
    imageGeneration: .init(outputMimeType: .png)
)

let response = try await session.respond(
    to: "Remove the background from this photo",
    images: [Transcript.ImageSegment(data: photoData, mimeType: "image/png")],
    options: options
)

for image in response.generatedImages {
    switch image.source {
    case .data(let data, let mimeType):
        print("Edited \(mimeType) image: \(data.count) bytes")
    case .url(let url):
        print("Image URL: \(url)")
    }
}

// OpenAI conversational image editing (via Responses API)
let model = OpenResponsesLanguageModel(
    baseURL: URL(string: "https://api.openai.com/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-4.1"
)
let session = LanguageModelSession(model: model)

var options = GenerationOptions()
options[custom: OpenResponsesLanguageModel.self] = .init(
    imageGeneration: .init(quality: .high)
)

let response = try await session.respond(
    to: "Make this photo look like a watercolor painting",
    images: [Transcript.ImageSegment(data: photoData, mimeType: "image/png")],
    options: options
)

for image in response.generatedImages {
    switch image.source {
    case .data(let data, let mimeType):
        print("Edited \(mimeType) image: \(data.count) bytes")
    case .url(let url):
        print("Image URL: \(url)")
    }
}

Providers

Apple Foundation Models

Uses Apple's system language model (requires macOS 26 / iOS 26 / visionOS 26 or later).

let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model)

let response = try await session.respond {
    Prompt("Explain quantum computing in one sentence")
}

Core ML

Runs Core ML models (requires CoreML trait):

let model = CoreMLLanguageModel(url: URL(fileURLWithPath: "path/to/model.mlmodelc"))

let session = LanguageModelSession(model: model)
let response = try await session.respond {
    Prompt("Summarize this text")
}

Enable the trait in Package.swift:

.package(
    url: "https://github.com/mattt/AnyLanguageModel.git",
    branch: "main",
    traits: ["CoreML"]
)

MLX

Runs MLX models on Apple Silicon (requires MLX trait):

let model = MLXLanguageModel(modelId: "mlx-community/Qwen3-0.6B-4bit")

let session = LanguageModelSession(model: model)
let response = try await session.respond {
    Prompt("What is the capital of France?")
}

Vision support depends on the specific MLX model you load. Use a vision‑capable model for multimodal prompts (for example, a VLM variant). The following shows extracting text from an image:

let ocr = try await session.respond(
    to: "Extract the total amount from this receipt",
    images: [
        .init(url: URL(fileURLWithPath: "/path/to/receipt_page1.png")),
        .init(url: URL(fileURLWithPath: "/path/to/receipt_page2.png"))
    ]
)
print(ocr.content)

Enable the trait in Package.swift:

.package(
    url: "https://github.com/mattt/AnyLanguageModel.git",
    branch: "main",
    traits: ["MLX"]
)

llama.cpp (GGUF)

Runs GGUF quantized models via llama.cpp (requires Llama trait):

let model = LlamaLanguageModel(modelPath: "/path/to/model.gguf")

let session = LanguageModelSession(model: model)
let response = try await session.respond {
    Prompt("Translate 'hello world' to Spanish")
}

Enable the trait in Package.swift:

.package(
    url: "https://github.com/mattt/AnyLanguageModel.git",
    branch: "main",
    traits: ["Llama"]
)

Configuration is done via custom generation options, allowing you to control runtime parameters per request:

var options = GenerationOptions(temperature: 0.8)
options[custom: LlamaLanguageModel.self] = .init(
    contextSize: 4096,        // Context window size
    batchSize: 512,           // Batch size for evaluation
    threads: 8,               // Number of threads
    seed: 42,                 // Random seed for deterministic output
    temperature: 0.7,         // Sampling temperature
    topK: 40,                 // Top-K sampling
    topP: 0.95,               // Top-P (nucleus) sampling
    repeatPenalty: 1.2,       // Penalty for repeated tokens
    repeatLastN: 128,         // Number of tokens to consider for repeat penalty
    frequencyPenalty: 0.1,    // Frequency-based penalty
    presencePenalty: 0.1,     // Presence-based penalty
    mirostat: .v2(tau: 5.0, eta: 0.1)  // Adaptive perplexity control
)

let response = try await session.respond(
    to: "Write a story",
    options: options
)

Ollama

Run models locally via Ollama's HTTP API:

// Default: connects to http://localhost:11434
let model = OllamaLanguageModel(model: "qwen3") // `ollama pull qwen3:8b`

// Custom endpoint
let model = OllamaLanguageModel(
    endpoint: URL(string: "http://remote-server:11434")!,
    model: "llama3.2"
)

let session = LanguageModelSession(model: model)
let response = try await session.respond {
    Prompt("Tell me a joke")
}

For local models, make sure you're using a vision‑capable model (for example, a -vl variant). You can combine multiple images:

let model = OllamaLanguageModel(model: "qwen3-vl") // `ollama pull qwen3-vl:8b`
let session = LanguageModelSession(model: model)
let response = try await session.respond(
    to: "Compare these posters and summarize their differences",
    images: [
        .init(url: URL(string: "https://example.com/poster1.jpg")!),
        .init(url: URL(fileURLWithPath: "/path/to/poster2.jpg"))
    ]
)
print(response.content)

Pass any model-specific parameters using custom generation options:

var options = GenerationOptions(temperature: 0.8)
options[custom: OllamaLanguageModel.self] = [
    "seed": .int(42),
    "repeat_penalty": .double(1.2),
    "num_ctx": .int(4096),
    "stop": .array([.string("###")])
]

OpenAI

Supports both Chat Completions and Responses APIs:

let model = OpenAILanguageModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-4o-mini"
)

let session = LanguageModelSession(model: model)
let response = try await session.respond(
    to: "List the objects you see",
    images: [
        .init(url: URL(string: "https://example.com/desk.jpg")!),
        .init(
            data: try Data(contentsOf: URL(fileURLWithPath: "/path/to/closeup.png")),
            mimeType: "image/png"
        )
    ]
)
print(response.content)

For OpenAI-compatible endpoints that use older Chat Completions API:

let model = OpenAILanguageModel(
    baseURL: URL(string: "https://api.example.com")!,
    apiKey: apiKey,
    model: "gpt-4o-mini",
    apiVariant: .chatCompletions
)

Use custom generation options for advanced parameters like sampling controls, reasoning effort (for o-series models), and vendor-specific extensions:

var options = GenerationOptions(temperature: 0.8)
options[custom: OpenAILanguageModel.self] = .init(
    topP: 0.9,
    frequencyPenalty: 0.5,
    presencePenalty: 0.3,
    stopSequences: ["END"],
    reasoningEffort: .high,        // For reasoning models (o3, o4-mini)
    serviceTier: .priority,
    extraBody: [                   // Vendor-specific parameters
        "custom_param": .string("value")
    ]
)

Image Generation

Generate images using OpenAI's Images API with DALL-E or GPT Image models:

let imageModel = OpenAIImageGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-image-1"  // or "dall-e-3"
)

let result = try await imageModel.generateImages(
    for: "A cute robot waving hello",
    options: ImageGenerationOptions(numberOfImages: 1, size: .square)
)

OpenAI-specific options include quality, background transparency, output format, and style:

var options = ImageGenerationOptions(size: .landscape)
options[custom: OpenAIImageGenerationModel.self] = .init(
    quality: .high,            // .low, .medium, .high
    background: .transparent,  // .opaque, .transparent
    outputFormat: .png,        // .png, .jpeg, .webp
    style: .vivid              // .natural, .vivid (DALL-E 3 only)
)

let result = try await imageModel.generateImages(
    for: "A minimalist logo for a coffee shop",
    options: options
)

Since OpenAIImageGenerationModel accepts a custom base URL, it also works with OpenAI-compatible image generation APIs like xAI's Grok. Use extraBody to pass vendor-specific parameters:

let grokModel = OpenAIImageGenerationModel(
    baseURL: URL(string: "https://api.x.ai/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
    model: "grok-imagine-image"
)

var options = ImageGenerationOptions(numberOfImages: 2)
options[custom: OpenAIImageGenerationModel.self] = .init(
    extraBody: [
        "aspect_ratio": .string("16:9"),
        "resolution": .string("2k")
    ]
)

let result = try await grokModel.generateImages(
    for: "A futuristic cityscape at night",
    options: options
)

Image Editing

Edit images using OpenAI's Images edits API by providing inputImages:

let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")

var options = ImageGenerationOptions(inputImages: [sourceImage])
options[custom: OpenAIImageGenerationModel.self] = .init(
    quality: .high
)

let result = try await imageModel.generateImages(
    for: "Remove the background",
    options: options
)

For inpainting, provide a mask where transparent areas indicate where to edit:

var options = ImageGenerationOptions(inputImages: [sourceImage])
options[custom: OpenAIImageGenerationModel.self] = .init(
    mask: Transcript.ImageSegment(data: maskData, mimeType: "image/png"),
    inputFidelity: .high  // .high, .low
)

let result = try await imageModel.generateImages(
    for: "Replace the sky with a sunset",
    options: options
)

Open Responses

Connects to any API that conforms to the Open Responses specification (e.g. OpenAI, OpenRouter, or other compatible providers). Base URL is required—use your provider’s endpoint:

// Example: OpenRouter (https://openrouter.ai/api/v1/)
let model = OpenResponsesLanguageModel(
    baseURL: URL(string: "https://openrouter.ai/api/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!,
    model: "openai/gpt-4o-mini"
)

// Example: OpenAI
let model = OpenResponsesLanguageModel(
    baseURL: URL(string: "https://api.openai.com/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!,
    model: "gpt-4o-mini"
)

let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Say hello")

Custom options support Open Responses–specific fields, such as tool_choice (including allowed_tools) and extraBody:

var options = GenerationOptions(temperature: 0.8)
options[custom: OpenResponsesLanguageModel.self] = .init(
    toolChoice: .auto,
    allowedTools: ["getWeather"],
    reasoningEffort: .high,
    extraBody: ["custom_param": .string("value")]
)

Image Generation

The Responses API supports image generation as a built-in server tool. Unlike the standalone Images API (used by OpenAIImageGenerationModel), images are generated as part of a conversation — enabling multi-turn editing and images alongside text responses:

let model = OpenResponsesLanguageModel(
    baseURL: URL(string: "https://api.openai.com/v1/")!,
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "gpt-4.1"
)
let session = LanguageModelSession(model: model)

var options = GenerationOptions()
options[custom: OpenResponsesLanguageModel.self] = .init(
    imageGeneration: .init(quality: .high, size: .square)
)

let response = try await session.respond(
    to: "Generate an image of a cat wearing a top hat",
    options: options
)

// Access generated images from the response
for image in response.generatedImages {
    switch image.source {
    case .data(let data, let mimeType):
        print("Got \(mimeType) image: \(data.count) bytes")
    case .url(let url):
        print("Image URL: \(url)")
    }
}

Configure image generation parameters through the tool options:

options[custom: OpenResponsesLanguageModel.self] = .init(
    imageGeneration: .init(
        quality: .high,            // .low, .medium, .high, .auto
        size: .landscape,          // .square, .landscape, .portrait, .auto
        background: .transparent,  // .opaque, .transparent, .auto
        outputFormat: .png,        // .png, .jpeg, .webp
        outputCompression: 80      // 0–100 (for lossy formats)
    )
)

Since image generation is a server tool, it works alongside regular text responses and function tools in the same request. The model decides when to generate images based on the conversation.

Anthropic

Uses the Messages API with Claude models:

let model = AnthropicLanguageModel(
    apiKey: ProcessInfo.processInfo.environment["ANTHROPIC_API_KEY"]!,
    model: "claude-sonnet-4-5-20250929"
)

let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
    Prompt("What's the weather like in San Francisco?")
}

You can include images with your prompt. You can point to remote URLs or construct from image data:

let response = try await session.respond(
    to: "Explain the key parts of this diagram",
    image: .init(
        data: try Data(contentsOf: URL(fileURLWithPath: "/path/to/diagram.png")),
        mimeType: "image/png"
    )
)
print(response.content)

Use custom generation options for Anthropic-specific parameters like extended thinking, tool choice control, and sampling parameters:

var options = GenerationOptions(temperature: 0.7)
options[custom: AnthropicLanguageModel.self] = .init(
    topP: 0.9,
    topK: 40,
    stopSequences: ["END", "STOP"],
    thinking: .init(budgetTokens: 4096),  // Extended thinking
    toolChoice: .auto,                     // Tool selection control
    serviceTier: .priority
)

Google Gemini

Uses the Gemini API with Gemini models:

let model = GeminiLanguageModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash"
)

let session = LanguageModelSession(model: model, tools: [WeatherTool()])
let response = try await session.respond {
    Prompt("What's the weather like in Tokyo?")
}

Send images with your prompt using remote or local sources:

let response = try await session.respond(
    to: "Identify the plants in this photo",
    image: .init(url: URL(string: "https://example.com/garden.jpg")!)
)
print(response.content)

Gemini models use an internal "thinking process" that improves reasoning and multi-step planning. Configure thinking mode through custom generation options:

var options = GenerationOptions()

// Enable thinking with dynamic budget allocation
options[custom: GeminiLanguageModel.self] = .init(thinking: .dynamic)

// Or set an explicit number of tokens for its thinking budget
options[custom: GeminiLanguageModel.self] = .init(thinking: .budget(1024))

// Disable thinking (default)
options[custom: GeminiLanguageModel.self] = .init(thinking: .disabled)

let response = try await session.respond(to: "Solve this problem", options: options)

Gemini supports server-side tools that execute transparently on Google's infrastructure:

var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
    serverTools: [
        .googleSearch,
        .googleMaps(latitude: 35.6580, longitude: 139.7016)
    ]
)

let response = try await session.respond(
    to: "What coffee shops are nearby?",
    options: options
)

Available server tools:

.googleSearch Grounds responses with real-time web information
.googleMaps Provides location-aware responses
.codeExecution Generates and runs Python code to solve problems
.urlContext Fetches and analyzes content from URLs mentioned in prompts

Tip

Gemini server tools are not available as client tools (Tool) for other models.

Image Generation

Gemini supports image generation through two approaches:

Imagen API — Uses Google's dedicated image generation models via the predict endpoint:

let imagenModel = GeminiImagenModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "imagen-4.0-generate-001"
)

let result = try await imagenModel.generateImages(
    for: "A photorealistic landscape of rolling hills at golden hour",
    options: ImageGenerationOptions(numberOfImages: 2, size: .landscape)
)

Imagen-specific options include output format, safety filters, person generation controls, and negative prompts:

var options = ImageGenerationOptions(size: .square)
options[custom: GeminiImagenModel.self] = .init(
    outputMimeType: .jpeg,                    // .jpeg, .png
    safetyFilterLevel: .blockMediumAndAbove,   // Safety filter threshold
    personGeneration: .allowAdult,             // .dontAllow, .allowAdult, .allowAll
    negativePrompt: "blurry, low quality"      // What to exclude
)

let result = try await imagenModel.generateImages(
    for: "A professional headshot portrait",
    options: options
)

Native Gemini — Uses the generateContent API with image output modalities. Models like gemini-2.5-flash-image and gemini-3-pro-image-preview can generate images alongside text:

let nativeModel = GeminiNativeImageGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash-image"
)

let result = try await nativeModel.generateImages(
    for: "Draw a cartoon cat wearing a top hat",
    options: ImageGenerationOptions()
)

// Native models may return text alongside the image
if let text = result.revisedPrompt {
    print("Model said: \(text)")
}

Configure aspect ratio, resolution, and output format through custom options:

var options = ImageGenerationOptions()
options[custom: GeminiNativeImageGenerationModel.self] = .init(
    aspectRatio: .widescreen,        // .square, .standard, .standardPortrait, .widescreen, .widescreenPortrait
    imageSize: .hd                   // .standard (1K), .hd (2K), .ultraHD (4K, gemini-3-pro only)
)

let result = try await nativeModel.generateImages(
    for: "A panoramic mountain landscape at sunset",
    options: options
)

You can also use the standard ImageGenerationOptions.size property, which maps to aspect ratio automatically (.square → 1:1, .landscape → 16:9, .portrait → 9:16).

Inline Image Generation in Conversations — Image-capable Gemini models can also generate images directly within a LanguageModelSession conversation, enabling multi-turn image editing and mixed text-and-image responses:

let model = GeminiLanguageModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)

var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
    imageGeneration: .init(aspectRatio: .widescreen)
)

let response = try await session.respond(
    to: "Draw a cartoon cat wearing a top hat",
    options: options
)

for image in response.generatedImages {
    switch image.source {
    case .data(let data, let mimeType):
        print("Got \(mimeType) image: \(data.count) bytes")
    case .url(let url):
        print("Image URL: \(url)")
    }
}

// Multi-turn editing works automatically —
// thought signatures are preserved across turns
let edited = try await session.respond(
    to: "Now change the hat to a beret",
    options: options
)

Note

Streaming (streamResponse) does not currently surface generated images.

Nano Banana (Gemini native image generation) — Google's Nano Banana models generate and edit images natively within conversations. Use gemini-2.5-flash-image (Nano Banana), gemini-3-pro-image-preview (Nano Banana Pro), or gemini-3.1-flash-image-preview (Nano Banana 2) with either the GeminiLanguageModel for multi-turn conversations or GeminiNativeImageGenerationModel for standalone generation:

// Multi-turn conversation with Nano Banana
let model = GeminiLanguageModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "gemini-2.5-flash-image"
)
let session = LanguageModelSession(model: model)

var options = GenerationOptions()
options[custom: GeminiLanguageModel.self] = .init(
    imageGeneration: .init(
        aspectRatio: .widescreen,    // 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
        imageSize: .hd               // .small (512px), .standard (1K), .hd (2K), .ultraHD (4K)
    )
)

let response = try await session.respond(
    to: "Draw a cartoon cat wearing a top hat",
    options: options
)

for image in response.generatedImages {
    // Use image.source to access .data(Data, String) or .url(URL)
}

// Multi-turn editing — the model remembers the previous image
let edited = try await session.respond(
    to: "Now change the hat to a beret",
    options: options
)

Nano Banana Pro (gemini-3-pro-image-preview) supports 4K resolution. Nano Banana 2 (gemini-3.1-flash-image-preview) is the latest Flash-based image model. Thought signatures are preserved automatically across turns so the model can reference and edit previous images.

Native Gemini also supports image editing by passing inputImages:

let sourceImage = Transcript.ImageSegment(data: imageData, mimeType: "image/png")

let result = try await nativeModel.generateImages(
    for: "Change the art style to watercolor",
    options: ImageGenerationOptions(inputImages: [sourceImage])
)

Video Generation

Generate videos from text prompts using the VideoGenerationModel protocol. OpenAI (Sora), xAI (Grok), and Google Gemini (Veo) all offer video generation models. All three use an asynchronous pattern — a job is created, polled until complete, and then the video is downloaded:

let model = OpenAIVideoGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "sora-2"
)

let result = try await model.generateVideo(
    for: "A drone shot of a sunset over the ocean",
    options: VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8)
)

// Access the generated videos
for video in result.videos {
    switch video.source {
    case .data(let data, let mimeType):
        print("Generated \(mimeType) video: \(data.count) bytes")
    case .url(let url):
        print("Video URL: \(url)")
    }
}

Control generation with VideoGenerationOptions:

var options = VideoGenerationOptions(
    aspectRatio: .landscape,  // .square, .landscape, .portrait
    durationSeconds: 8        // Snapped to nearest valid value per provider
)

// Set provider-specific options
options[custom: OpenAIVideoGenerationModel.self] = .init(
    size: "1920x1080"
)

let result = try await model.generateVideo(for: "A timelapse of clouds", options: options)

Video generation support by provider:

Provider	Model Names	Aspect Ratios	Duration (seconds)	Custom Options
OpenAI (Sora)	`sora-2`, `sora-2-pro`	1:1, 16:9, 9:16	4, 8, 12	size, extraBody
xAI (Grok)	`grok-imagine-video`	1:1, 16:9, 9:16	1–15	resolution (480p, 720p), extraBody
Gemini (Veo)	`veo-3.1-generate-preview`, etc	16:9, 9:16	4, 6, 8	resolution, negativePrompt, personGeneration

OpenAI (Sora)

let model = OpenAIVideoGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
    model: "sora-2"  // or "sora-2-pro"
)

var options = VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8)  // Valid: 4, 8, 12
options[custom: OpenAIVideoGenerationModel.self] = .init(
    size: "1280x720",        // Explicit size (overrides aspect ratio)
    pollInterval: 15          // Custom polling interval in seconds
)

let result = try await model.generateVideo(
    for: "A cinematic sunrise over a mountain range",
    options: options
)

xAI (Grok)

let model = XAIVideoGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["XAI_API_KEY"]!,
    model: "grok-imagine-video"
)

var options = VideoGenerationOptions(aspectRatio: .portrait, durationSeconds: 5)
options[custom: XAIVideoGenerationModel.self] = .init(
    resolution: ._720p  // ._480p, ._720p
)

let result = try await model.generateVideo(
    for: "A cat playing with a ball of yarn",
    options: options
)

Gemini (Veo)

let model = GeminiVideoGenerationModel(
    apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
    model: "veo-3.1-generate-preview"  // or "veo-2.0-generate-001"
)

var options = VideoGenerationOptions(aspectRatio: .landscape, durationSeconds: 8)
options[custom: GeminiVideoGenerationModel.self] = .init(
    resolution: ._1080p,                    // ._720p, ._1080p, ._4k
    negativePrompt: "blurry, low quality",  // What to exclude
    personGeneration: .allowAdult           // .dontAllow, .allowAdult, .allowAll
)

let result = try await model.generateVideo(
    for: "A timelapse of clouds moving over a mountain",
    options: options
)

Testing

Run the test suite to verify everything works correctly:

swift test

Tests for different language model backends have varying requirements:

Backend	Traits	Environment Variables
CoreML	`CoreML`	`HF_TOKEN`
MLX	`MLX`	`HF_TOKEN`
Llama	`Llama`	`LLAMA_MODEL_PATH`
Anthropic	—	`ANTHROPIC_API_KEY`
OpenAI	—	`OPENAI_API_KEY`
Open Responses	—	`OPEN_RESPONSES_API_KEY`, `OPEN_RESPONSES_BASE_URL`
Ollama	—	—

Example setup for running multiple tests at once:

export HF_TOKEN=your_huggingface_token
export LLAMA_MODEL_PATH=/path/to/model.gguf
export ANTHROPIC_API_KEY=your_anthropic_key
export OPENAI_API_KEY=your_openai_key
export OPEN_RESPONSES_API_KEY=your_open_responses_key
export OPEN_RESPONSES_BASE_URL=https://api.openai.com/v1/

swift test --traits CoreML,Llama

Tip

Tests that perform generation are skipped in CI environments (when CI is set). Override this by setting ENABLE_COREML_TESTS=1 or ENABLE_MLX_TESTS=1.

Note

MLX tests must be run with xcodebuild rather than swift test due to Metal library loading requirements. Since xcodebuild doesn't support package traits directly, you'll first need to update Package.swift to enable the MLX trait by default.

- .default(enabledTraits: []),
+ .default(enabledTraits: ["MLX"]),

Pass environment variables with TEST_RUNNER_ prefix:

export TEST_RUNNER_HF_TOKEN=your_huggingface_token
xcodebuild test \
  -scheme AnyLanguageModel \
  -destination 'platform=macOS' \
  -only-testing:AnyLanguageModelTests/MLXLanguageModelTests

License

This project is available under the MIT license. See the LICENSE file for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github/workflows		.github/workflows
Sources		Sources
Tests/AnyLanguageModelTests		Tests/AnyLanguageModelTests
.gitignore		.gitignore
.swift-format		.swift-format
LICENSE.md		LICENSE.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AnyLanguageModel

Features

Supported Providers

Requirements

Installation

Package Traits

Using Traits in Xcode Projects

API Credentials and Security

Bring Your Own Key (BYO)

Proxy Server

Usage

Guided Generation

Image Inputs

Tool Calling

Image Generation

Image Editing

Conversational Image Editing

Providers

Apple Foundation Models

Core ML

MLX

llama.cpp (GGUF)

Ollama

OpenAI

Image Generation

Image Editing

Open Responses

Image Generation

Anthropic

Google Gemini

Image Generation

Video Generation

OpenAI (Sora)

xAI (Grok)

Gemini (Veo)

Testing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages