
Ollama with .NET – Run AI Models Locally Using C# and Docker
Artificial Intelligence is everywhere, but most of the time when we want to integrate it into applications, the first option is a cloud API. While that works, it comes with three common drawbacks:
- Ongoing costs
- Latency due to network round trips
- Privacy concerns because data leaves your machine
What if we could run modern AI models directly on our local machine, with no API keys and full control over our data?
That’s exactly what Ollama provides. And the best part? It integrates smoothly with .NET.
In this article, let’s look at what Ollama is, why it matters, and how you can set it up and connect it with .NET — including Docker integration.
What is Ollama?
Ollama is an open-source runtime for large language models (LLMs). It allows you to pull and run AI models locally, without needing to rely on external cloud providers.
Some of the models you can run with Ollama include:
- gpt-oss
- deepseek-r1
- gemma3
- llama3.2
- qwen3
Why Ollama is useful
- Privacy – Data never leaves your machine
- Low latency – Faster responses since there’s no network overhead
- No API fees – Experiment freely without cloud billing
- Flexibility – Choose and run different models as needed
For enterprise environments, this is critical: sensitive data can stay in-house while still benefiting from AI.
Why Ollama Matters for .NET Developers
Most Ollama examples online use Python. But many of us work daily with C#, ASP.NET Core, and enterprise applications.
With the Microsoft.Extensions.AI
ecosystem and libraries like OllamaSharp
, we can now:
- Build AI copilots for developers inside .NET tools.
- Integrate AI into ASP.NET Core without external services.
- Run local prototypes with no risk of leaking sensitive data.
- Deploy enterprise apps that meet compliance requirements.
In short: Ollama makes AI accessible and private for .NET developers.
Setting Up Ollama
Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Step 2: Pull a model
Once you’ve installed Ollama, the next step is to download a model. For example, to pull Llama 3.2:
ollama pull llama3.2
you can find all available models here.
Step 3: Run Ollama
To run and chat with Llama 3.2:
ollama run llama3.2
By default, Ollama runs a local API at http://localhost:11434
.
Note: Keep in mind that larger models require more memory to run smoothly – plan for at least 8 GB of RAM when working with 7B models, around 16 GB for 13B models, and roughly 32 GB if you want to run 33B models locally.
Connect .NET to Ollama
Now that Ollama is up and running, let’s see how to build a simple .NET console app that can talk to it.
Step 1 : Create a Console Project
Open a terminal and create a new console application:
dotnet new console -n OllamaConsoleApp
cd OllamaConsoleApp
Step 2 : Add the Required NuGet Packages
We need two NuGet packages:
Microsoft.Extensions.AI
– provides the core abstractions for working with AI in .NET.OllamaSharp
– acts as the connector between .NET and the Ollama service.
Install them using:
dotnet add package Microsoft.Extensions.AI
dotnet add package OllamaSharp
Note: You might come across
Microsoft.Extensions.AI.Ollama
, but that package is no longer maintained. The recommended approach is to useOllamaSharp
instead.
Step 3 : Write the Chat Program
Open Program.cs
and replace the default code with this example:
using Microsoft.Extensions.AI;
using OllamaSharp;
// Create Ollama client for the "llama 3.2" model
IChatClient chatClient = new OllamaApiClient(
new Uri("http://localhost:11434/"),
"llama 3.2");
// Keep track of chat history
List<ChatMessage> chatHistory = new();
Console.WriteLine("Type 'exit' to quit");
Console.WriteLine();
while (true)
{
Console.Write("You: ");
var userInput = Console.ReadLine();
if (string.IsNullOrWhiteSpace(userInput))
{
continue;
}
if (string.Equals(userInput, "exit", StringComparison.OrdinalIgnoreCase))
{
break;
}
// Add user message
chatHistory.Add(new ChatMessage(ChatRole.User, userInput));
Console.Write("Assistant: ");
var assistantResponse = "";
// Stream the AI response in real time
await foreach (var update in chatClient.GetStreamingResponseAsync(chatHistory))
{
Console.Write(update.Text);
assistantResponse += update.Text;
}
chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistantResponse));
Console.WriteLine("\n");
}
This small program does a few things:
- Keeps a rolling chat history so the assistant remembers context.
- Streams each response token by token for a real-time feel.
- Lets you type continuously until you exit.
Step 4: Run the App
Finally, make sure your Ollama service is running (with a model pulled and ready). Then start the app:
dotnet run
You’ll now have a working AI assistant running locally on .NET.
Running Ollama with Docker
If you prefer to use containers, Ollama also provides a Docker image. This way you can run the service in a consistent and isolated environment.
Step 1: Start the Container
Run the following command to start Ollama, mount a volume for models, and expose the API on port 11434
:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
This command runs Ollama in CPU-only mode.
Step 2: Pull a Model into the Container
With the container running, pull a model such as Llama 3.2:
docker exec -it ollama ollama pull llama3.2
Step 3: Connect from .NET
The Ollama service is now available at:
http://localhost:11434
Your .NET console app can connect to this endpoint exactly the same way as before.
Note: If you want to run Ollama with GPU acceleration, refer to the official guide here: Ollama GPU with Docker
Summary
Ollama makes it possible to run advanced language models completely on your local machine, without depending on cloud providers. For .NET developers, this opens up a secure, cost-free way to experiment with AI right inside familiar C# applications.
Takeaways
- Ollama runs AI models locally – no cloud calls, no API keys.
- Integration with .NET is simple using
OllamaSharp
andMicrosoft.Extensions.AI
. - Models vary in size and requirements – make sure your machine has enough RAM.
- Docker support makes it easy to run Ollama in a containerized environment.