Ollama with .NET — Run AI Models Locally Using C#, Docker

Artificial Intelligence is everywhere, but most of the time when we want to integrate it into applications, the first option is a cloud API. While that works, it comes with three common drawbacks:

Ongoing costs
Latency due to network round trips
Privacy concerns because data leaves your machine

What if we could run modern AI models directly on our local machine, with no API keys and full control over our data?

That’s exactly what Ollama provides. And the best part? It integrates smoothly with .NET.

In this article, let’s look at what Ollama is, why it matters, and how you can set it up and connect it with .NET — including Docker integration.

What is Ollama?

Ollama is an open-source runtime for large language models (LLMs). It allows you to pull and run AI models locally, without needing to rely on external cloud providers.

Some of the models you can run with Ollama include:

gpt-oss
deepseek-r1
gemma3
llama3.2
qwen3

Why Ollama is useful

Privacy – Data never leaves your machine
Low latency – Faster responses since there’s no network overhead
No API fees – Experiment freely without cloud billing
Flexibility – Choose and run different models as needed

For enterprise environments, this is critical: sensitive data can stay in-house while still benefiting from AI.

Why Ollama Matters for .NET Developers

Most Ollama examples online use Python. But many of us work daily with C#, ASP.NET Core, and enterprise applications.

With the Microsoft.Extensions.AI ecosystem and libraries like OllamaSharp, we can now:

Build AI copilots for developers inside .NET tools.
Integrate AI into ASP.NET Core without external services.
Run local prototypes with no risk of leaking sensitive data.
Deploy enterprise apps that meet compliance requirements.

In short: Ollama makes AI accessible and private for .NET developers.

Setting Up Ollama

Step 1: Install Ollama

macOS – Download
Windows – Download
Linux

curl -fsSL https://ollama.com/install.sh | sh

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Pull a model

Once you’ve installed Ollama, the next step is to download a model. For example, to pull Llama 3.2:

Bash

ollama pull llama3.2

ollama pull llama3.2

you can find all available models here.

Step 3: Run Ollama

To run and chat with Llama 3.2:

Bash

ollama run llama3.2

ollama run llama3.2

By default, Ollama runs a local API at http://localhost:11434.

Note: Keep in mind that larger models require more memory to run smoothly – plan for at least 8 GB of RAM when working with 7B models, around 16 GB for 13B models, and roughly 32 GB if you want to run 33B models locally.

Connect .NET to Ollama

Now that Ollama is up and running, let’s see how to build a simple .NET console app that can talk to it.

Step 1 : Create a Console Project

Open a terminal and create a new console application:

dotnet new console -n OllamaConsoleApp
cd OllamaConsoleApp

dotnet new console -n OllamaConsoleApp
cd OllamaConsoleApp

Step 2 : Add the Required NuGet Packages

We need two NuGet packages:

Microsoft.Extensions.AI – provides the core abstractions for working with AI in .NET.
OllamaSharp – acts as the connector between .NET and the Ollama service.

Install them using:

dotnet add package Microsoft.Extensions.AI
dotnet add package OllamaSharp

dotnet add package Microsoft.Extensions.AI
dotnet add package OllamaSharp

Note: You might come across Microsoft.Extensions.AI.Ollama, but that package is no longer maintained. The recommended approach is to use OllamaSharp instead.

Step 3 : Write the Chat Program

Open Program.cs and replace the default code with this example:

using Microsoft.Extensions.AI;
using OllamaSharp;

// Create Ollama client for the "llama 3.2" model
IChatClient chatClient = new OllamaApiClient(
    new Uri("http://localhost:11434/"),
    "llama 3.2");

// Keep track of chat history
List<ChatMessage> chatHistory = new();

Console.WriteLine("Type 'exit' to quit");
Console.WriteLine();

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();    

    if (string.IsNullOrWhiteSpace(userInput))
    {
        continue;
    }
    
    if (string.Equals(userInput, "exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add user message
    chatHistory.Add(new ChatMessage(ChatRole.User, userInput));

    Console.Write("Assistant: ");
    var assistantResponse = "";

    // Stream the AI response in real time
    await foreach (var update in chatClient.GetStreamingResponseAsync(chatHistory))
    {
        Console.Write(update.Text);
        assistantResponse += update.Text;
    }

    chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistantResponse));
    Console.WriteLine("\n");
}

using Microsoft.Extensions.AI;
using OllamaSharp;

// Create Ollama client for the "llama 3.2" model
IChatClient chatClient = new OllamaApiClient(
    new Uri("http://localhost:11434/"),
    "llama 3.2");

// Keep track of chat history
List<ChatMessage> chatHistory = new();

Console.WriteLine("Type 'exit' to quit");
Console.WriteLine();

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();    

    if (string.IsNullOrWhiteSpace(userInput))
    {
        continue;
    }
    
    if (string.Equals(userInput, "exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add user message
    chatHistory.Add(new ChatMessage(ChatRole.User, userInput));

    Console.Write("Assistant: ");
    var assistantResponse = "";

    // Stream the AI response in real time
    await foreach (var update in chatClient.GetStreamingResponseAsync(chatHistory))
    {
        Console.Write(update.Text);
        assistantResponse += update.Text;
    }

    chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistantResponse));
    Console.WriteLine("\n");
}

This small program does a few things:

Keeps a rolling chat history so the assistant remembers context.
Streams each response token by token for a real-time feel.
Lets you type continuously until you exit.

Step 4: Run the App

Finally, make sure your Ollama service is running (with a model pulled and ready). Then start the app:

dotnet run

dotnet run

You’ll now have a working AI assistant running locally on .NET.

Running Ollama with Docker

If you prefer to use containers, Ollama also provides a Docker image. This way you can run the service in a consistent and isolated environment.

Step 1: Start the Container

Run the following command to start Ollama, mount a volume for models, and expose the API on port 11434:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

This command runs Ollama in CPU-only mode.

Step 2: Pull a Model into the Container

With the container running, pull a model such as Llama 3.2:

docker exec -it ollama ollama pull llama3.2

docker exec -it ollama ollama pull llama3.2

Step 3: Connect from .NET

The Ollama service is now available at:

http://localhost:11434

http://localhost:11434

Your .NET console app can connect to this endpoint exactly the same way as before.

Note: If you want to run Ollama with GPU acceleration, refer to the official guide here: Ollama GPU with Docker

Summary

Ollama makes it possible to run advanced language models completely on your local machine, without depending on cloud providers. For .NET developers, this opens up a secure, cost-free way to experiment with AI right inside familiar C# applications.

Takeaways

Ollama runs AI models locally – no cloud calls, no API keys.
Integration with .NET is simple using OllamaSharp and Microsoft.Extensions.AI.
Models vary in size and requirements – make sure your machine has enough RAM.
Docker support makes it easy to run Ollama in a containerized environment.