Skip to content

microsoft/foundry-local-on-windowsserver-samples

Foundry Local for Windows Server

Table of Contents

Overview

The following samples demonstrate how Windows Server can be used to run AI workloads on-premises with Foundry Local, ensuring data privacy and compliance with the strict requirements of regulated industries.

Samples

Contoso Medical

The ContosoMedical application highlights two AI-driven scenarios:

  • Medical Record Summarization: Automatically condenses lengthy patient reports into concise medical summaries.
  • Medical Record Translation: Translates medical documents from foreign languages into English while preserving medical terminology and formatting.

Important: The Foundry Local endpoint must be configured by yourself. The provided endpoints in this sample are internal Microsoft endpoints that can only be accessed via VPN and CorpNet. Please see the Endpoint Configuration section for details on how to configure your own endpoints.

MCP Tool Calling

The MCP-ToolCalling sample demonstrates how to integrate MCP (Model Context Protocol) servers with Foundry Local and Semantic Kernel using tool calling. A custom Weather MCP server exposes real-time weather data via the National Weather Service API, and a .NET agent uses Semantic Kernel to connect a locally-running Foundry Local model to those tools.

Key components:

  • Weather MCP Server (Node.js/TypeScript): Exposes two tools over MCP Streamable HTTP transport — get_alerts (active weather alerts for a US state) and get_forecast (multi-day forecast for any lat/lon coordinates).
  • Foundry Local MCP Agent (.NET 9): Starts an embedded Foundry Local instance with the qwen2.5-7b model, connects to the MCP server via a custom McpHttpClient, registers the tools as a Semantic Kernel plugin, and runs an interactive chat loop with automatic tool invocation.
foundry-local-mcp-agent (.NET)
  ├── Foundry Local (qwen2.5-7b, port 9001)
  └── Semantic Kernel + WeatherMcpPlugin
              │ MCP over HTTP
    Weather MCP Server (Node.js, port 3000)
              │
    National Weather Service API

Prerequisites: Node.js v18+, .NET 9 SDK.

See MCP-ToolCalling/README.md for full setup and usage instructions.

Setup

Installing Foundry Local on Windows Server 2025

  1. Download Foundry Local

    winget install Microsoft.FoundryLocal
  2. Start the Foundry Local service

    foundry service start
  3. Download a Language Model

    For example, to download phi-4-mini:

    foundry model download phi-4-mini

For additional details on Foundry Local, see Foundry Local documentation.

Accessing the Foundry Local service over the network

By default, Foundry Local listens on 127.0.0.1:<foundry-local-port>, which restricts inference requests to the local machine.

To enable access from other devices on the network (or connected via VPN), use Windows PortProxy to forward external traffic on port 9000 to the Foundry Local service port.

  1. Create a port proxy

    netsh interface portproxy add v4tov4 listenport=9000 listenaddress=0.0.0.0 connectport=<foundry-local-port> connectaddress=127.0.0.1
  2. Allow inbound TCP traffic

    netsh advfirewall firewall add rule name="Allow Port 9000 Inbound" dir=in action=allow protocol=TCP localport=9000
  3. Verify connectivity

    From any host on the same network, confirm that the Foundry Local service is reachable:

    curl http://<server-ip>:9000/openai/status

How to use ContosoMedical app

Prerequisites

  • .NET Framework 4.8 or later
  • Visual Studio 2019 or later

Run the application

Open the solution in Visual Studio and build the project. Run the application by pressing the F5 key or by clicking on the Start button in the toolbar.

Architecture

System Overview

The sample application uses a client–server architecture, where the WPF desktop client processes medical records by leveraging Language Model capabilities hosted on Windows Server instances configured as described above.

┌──────────────────────────────────────────┐
│             ContosoMedical               │
│  ┌─────────────────────────────────────┐ │
│  │           WPF Frontend              │ │
│  │   ┌─────────────────────────────┐   │ │
│  │   │ Patient Records Interface   │   │ │
│  │   │                             │   │ │
│  │   └─────────────────────────────┘   │ │
│  └─────────────────────────────────────┘ │
│  ┌─────────────────────────────────────┐ │
│  │              Services               │ │
│  │   ┌─────────────┐ ┌─────────────┐   │ │
│  │   │ Summarizer  │ │ Translator  │   │ │
│  │   │ (Map-Reduce)│ │ (Chunking)  │   │ │
│  │   └─────────────┘ └─────────────┘   │ │
│  └─────────────────────────────────────┘ │
└──────────────────────────────────────────┘
                │ HTTP/REST
                │ (OpenAI-Compatible API)
┌──────────────────────────────────────────┐
│           Windows Server Layer           │ 
│  ┌─────────────────────────────────────┐ │
│  │           Foundry Local             │ │
│  │   ┌─────────────┐ ┌─────────────┐   │ │
│  │   │   Phi-3.5   │ │   Phi-4     │   │ │
│  │   │    Mini     │ │    Mini     │   │ │
│  │   │             │ │             │   │ │
│  │   └─────────────┘ └─────────────┘   │ │
│  └─────────────────────────────────────┘ │
└──────────────────────────────────────────┘

Main Components

ContosoMedical/
├── Services/
│   ├── Summarizer.cs          # Map-Reduce summarization
│   ├── Translator.cs          # Chunk-based translation
│   └── ...
├── Models/
│   ├── Patient.cs             # Patient data structures
│   └── ...
├── DefaultDataAssets/         # Synthetic patient data
└── App.config                 # Foundry Local endpoint configuration

Foundry Local Integration

Connection Configuration

The application connects to Foundry Local endpoints defined in App.config:

<appSettings>
  <add key="FoundryLocalEndPoint1" value="http://10.137.212.105:9000" />
  <add key="FoundryLocalEndPoint2" value="http://10.137.214.85:9000" />
  <add key="FoundryLocalLanguageModel" value="Phi-3.5-mini-instruct-generic-cpu:1" />
  <add key="FoundryLocalLanguageModel2" value="Phi-4-mini-instruct-generic-cpu:4" />
</appSettings>

HTTP Client Integration

The application’s Summarizer.cs and Translator.cs services use HttpClient to communicate with Foundry Local:

// Initialize HTTP clients for Foundry Local communication
var httpClient = new HttpClient { Timeout = TimeSpan.FromSeconds(300) };
var endpoint = configurationManager.GetAppSetting("FoundryLocalEndPoint1");

OpenAI-Compatible API Usage

Each request to Foundry Local follows the standard OpenAI chat completions schema:

var requestBody = new
{
    model = configurationManager.GetAppSetting("FoundryLocalLanguageModel"),
    messages = new[]
    {
        new { role = "system", content = "You are a medical summarization assistant..." },
        new { role = "user", content = "Summarize this medical record section..." }
    },
    max_tokens = 200,
    temperature = 0.0
};

// Send to Foundry Local endpoint
var response = await httpClient.PostAsync(endpoint + "/v1/chat/completions", content);

Configuration

The ContosoMedical application requires proper configuration to connect to your Foundry Local instances and manage local data storage. All configuration settings are stored in the App.config file located in the project root.

App.config Settings

The application uses the following key configuration parameters:

<appSettings>
  <add key="LocalDataDirectory" value="C:\temp\patient_summary_tool_local_data\" />
  <add key="FoundryLocalEndPoint1" value="http://10.137.212.105:9000" />
  <add key="FoundryLocalEndPoint2" value="http://10.137.214.85:9000" />
  <add key="FoundryLocalLanguageModel" value="Phi-3.5-mini-instruct-generic-cpu:1" />
  <add key="FoundryLocalLanguageModel2" value="Phi-4-mini-instruct-generic-cpu:4"/>
</appSettings>

Endpoint Configuration

FoundryLocalEndPoint1 and FoundryLocalEndPoint2

  • These specify the URLs of your Foundry Local server instances
  • Format: http://<server-ip>:<port> or http://localhost:<foundry-local-port> for local testing
  • Default port: 9000 (if using PortProxy as described in Accessing the Foundry Local service over the network)
  • The application uses multiple endpoints for parallel processing and load balancing
  • You can also use Foundry Local localhost (e.g., http://localhost:<foundry-local-port>) if running the application on the same machine as Foundry Local

How to update:

  1. Replace the IP addresses with your actual Windows Server IP addresses, or use http://localhost:<foundry-local-port> for local testing
  2. Ensure the port matches your PortProxy configuration or Foundry Local default port
  3. Verify connectivity using: curl http://<server-ip>:<port>/openai/status or curl http://localhost:<foundry-local-port>/openai/status

Language Model Configuration

FoundryLocalLanguageModel and FoundryLocalLanguageModel2

  • Specify which models to use for different operations
  • Model names must match exactly what's available in your Foundry Local instance
  • Different models are optimized for different tasks

How to find available models:

foundry model list

Current model usage:

  • FoundryLocalLanguageModel (Phi-3.5-mini-instruct): Used for summarization and general translation
  • FoundryLocalLanguageModel2 (Phi-4-mini-instruct): Used for medication section translation

To change models:

  1. Ensure the desired model is downloaded: foundry model download <model-name>
  2. Update the config with the exact model name from foundry model list
  3. Rebuild and restart the application

Local Data Directory

LocalDataDirectory

  • Specifies where the application stores patient data (default and newly uploaded) and temporary files
  • Default: C:\temp\patient_summary_tool_local_data\
  • The directory will be created automatically if it doesn't exist

How to change:

  1. Update the path in App.config
  2. Ensure the application has read/write permissions to the directory

Data Generation

This application uses the Synthea synthetic patient data generator to create realistic medical records for testing and demonstration purposes. Synthea is an open-source synthetic patient generator that models the medical history of synthetic patients.

Generating Synthetic Patient Data with Synthea

  1. Prerequisites

    • Java 11 or later (required to run Synthea)
  2. Download Synthea

    Download the latest pre-built JAR file from the Synthea releases page:

    curl -L -O https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar
  3. Generate Patient Data

    The ContosoMedical application expects patient data in plain text format.

    To generate a single patient record:

    java -jar synthea-with-dependencies.jar --exporter.text.export true -generate.append_numbers_to_person_names false

    To generate multiple patient records (e.g., 10 patients):

    java -jar synthea-with-dependencies.jar -p 10 --exporter.text.export true -generate.append_numbers_to_person_names false

    Parameters explained:

    • --exporter.text.export true - Enables plain text format export
    • -generate.append_numbers_to_person_names false - Prevents numeric suffixes from being added to patient names
  4. Locate Generated Data

    By default, Synthea outputs patient records in the output directory in various formats including FHIR, C-CDA, and plain text.

For more detailed instructions and advanced configuration options, see the Synthea Basic Setup and Running guide.

Data Pre-processing

Both summarization and translation workflows begin with a similar preprocessing stage. The application first identifies medical record sections based on a known delimiter, and then divides each section into manageable chunks while preserving natural text boundaries (e.g., line breaks).

This chunking strategy ensures that related medical information stays together and that each model request fits within language model input limits. Chunk size varies depending on the operation:

  • Summarization: ~3,000 characters per chunk for efficient content compression.
  • Translation: ~500 characters per chunk (450 for medications) for precise handling of specialized terminology.

Model Selection

The application uses different language models optimized for specific tasks:

  • Phi-3.5-mini-instruct: Primary model for summarization and general translation
  • Phi-4-mini-instruct: Specialized for medication section translation (better pharmaceutical terminology handling)

Summarization for Long Text Inputs

The ContosoMedical application addresses the challenge of summarizing lengthy medical reports that can span hundreds of lines and exceed the input limits of small language models. These records often include detailed patient histories, multiple clinical sections, and extensive observations that make single-pass summarization impractical.

To overcome this, the summarization process uses a Map-Reduce approach designed for medical content. In the Map phase, individual chunks are processed in parallel across available Foundry Local endpoints, with each chunk receiving a summarization prompt that generates a 200-token summary that preserves the critical medical information. Multiple workers simultaneously process different chunks, automatically load-balancing the work across available server endpoints.

The Reduce phase takes all the section summaries and combines them into a single, coherent patient overview. This final integration step uses specialized prompts that emphasize the most important medical data including diagnoses, treatments, medications, and clinical results, while eliminating redundancy and maintaining a consistent medical narrative.

Map Phase Prompt (for intermediate summaries):

System: "You are a precise summarization assistant of a patient's record. You'll be presented with one or more sections of a patient's medical record."

User: "Generate a concise summary of the following medical record section(s), prioritizing the most recent information. DO NOT exceed 200 tokens. DO NOT include the token count in the summary

Text:
[SECTION CONTENT]"

Reduce Phase Prompt (for final summary):

System: "You are an expert summarizer that merges multiple summaries into one cohesive overview in at most 300 words."

User: "You will be given multiple summaries of a medical report. Generate one final concise summary with emphasis on medical data which include details in the following context: (Patient details, allergies, medication, conditions, procedures, treatments, doctor or provider visits and clinical results). DO NOT exceed 300 words in generating the final summary.

SUMMARIES:
[INTERMEDIATE SUMMARIES]"

Translation for Long Text Inputs

In addition to summarization, the application also supports translation of patient reports into English. While translation shares some of the challenges of summarization, it presents unique challenges as it must avoid information loss, maintain clinical accuracy, and handle specialized terminology.

Therefore, the translation process employs a simpler parallel processing algorithm that focuses on maintaining document structure and medical accuracy. The algorithm processes chunks across available Foundry Local endpoints, with each chunk receiving a translation prompt designed to preserve medical formatting and terminology.

After translation, all chunks are reassembled in their original order, producing a complete, structured, and clinically accurate English version of the medical record.

Translation Prompt (for all chunks):

System: "You are a professional medical translator. If a label (e.g., [ATTUALE], [INTERROTTO]) appears, translate it literally (e.g., [CURRENT], [STOPPED])."

User: "Translate the following medical data about a patient from {source_language} into English.
Preserve the structure and formatting of the original text as much as possible.
Do not add any translator explanations, or notes, or commentary. The only output should be the translated text.

Text:
[CHUNK CONTENT]"

Limitations

The current implementation of Foundry Local and the ContosoMedical sample operates under the following limitations:

  • Private Preview: Foundry Local is currently in Private Preview and may be subject to feature changes, limited availability, or temporary instability.
  • Model Availability: Not all language models are available in the Private Preview. The specific model required for your scenario may not yet be supported.
  • No Embedding Model Support: Embedding models are not yet supported. Features such as semantic search, document retrieval, or similarity-based indexing are unavailable in this release.
  • No Concurrency Support: Concurrent inference requests to Foundry Local are not yet supported. Requests are processed sequentially, and parallel execution across multiple endpoints must be managed at the application level.

Future Work

As Foundry Local continues to evolve, future updates to these samples will explore additional capabilities on Windows Server, including agentic workflows, containerization, advanced model integrations, and performance improvements.

Community contributions and feedback are highly encouraged and greatly appreciated.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

Foundry local on Windows Server samples

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors