Chat and Generate
OllamaSharp exposes two main ways to interact with a language model: the high-level Chat class and the lower-level GenerateAsync / ChatAsync methods on OllamaApiClient.
The Chat class
The Chat class is the recommended starting point for most conversational use cases. It automatically maintains a message history across turns so the model has full context of the conversation.
Basic chat loop
var ollama = new OllamaApiClient("http://localhost:11434", "qwen3.5:35b-a3b");
var chat = new Chat(ollama);
while (true)
{
Console.Write("You: ");
var message = Console.ReadLine()!;
Console.Write("Assistant: ");
await foreach (var token in chat.SendAsync(message))
Console.Write(token);
Console.WriteLine();
}
Every call to SendAsync appends the user message and the model's reply to chat.Messages, so subsequent turns automatically include the conversation history.
System prompts
Pass a system prompt to the constructor to give the model a persona or set behavioural constraints:
var chat = new Chat(ollama, "You are a helpful assistant that only answers questions about cooking.");
await foreach (var token in chat.SendAsync("How do I make pasta carbonara?"))
Console.Write(token);
Overriding the model per chat
By default the Chat uses the client's SelectedModel, but you can override it per instance:
var chat = new Chat(ollama)
{
Model = "deepseek-r1:14b"
};
Accessing the message history
The full conversation is stored in chat.Messages and can be inspected or serialised at any time:
foreach (var msg in chat.Messages)
Console.WriteLine($"[{msg.Role}] {msg.Content}");
Sending images (multi-modal models)
Vision models such as qwen3.5:35b-a3b accept images alongside text. Pass image data as raw bytes:
var ollama = new OllamaApiClient("http://localhost:11434", "qwen3.5:35b-a3b");
var chat = new Chat(ollama);
var imageBytes = await File.ReadAllBytesAsync("photo.jpg");
await foreach (var token in chat.SendAsync("What do you see in this image?", [imageBytes]))
Console.Write(token);
Or as Base64-encoded strings if that is more convenient:
var base64 = Convert.ToBase64String(imageBytes);
await foreach (var token in chat.SendAsync("Describe the image", [base64]))
Console.Write(token);
Structured / JSON output
Ask the model to respond with valid JSON by passing "json" as the format argument, or pass a JSON Schema object:
await foreach (var token in chat.SendAsync(
"List the capitals of France, Germany and Italy as JSON",
tools: null,
imagesAsBase64: null,
format: "json"))
{
Console.Write(token);
}
Sending a message in a specific role
SendAsAsync lets you inject messages under any role — useful for priming the conversation or simulating prior turns:
// Inject a previous assistant turn before the user speaks
await foreach (var token in chat.SendAsAsync(ChatRole.Assistant, "I already know your name is Alex."))
Console.Write(token);
await foreach (var token in chat.SendAsAsync(ChatRole.User, "What is my name?"))
Console.Write(token);
Thinking / reasoning models
For reasoning models (e.g. deepseek-r1, qwen3, phi4-reasoning) you can request "think tokens". The model's internal reasoning is surfaced through the OnThink event and kept separate from the visible answer.
Basic boolean mode
Set Think to true to enable thinking:
var chat = new Chat(ollama) { Think = true };
chat.OnThink += (_, thoughts) => Console.Write($"[thinking] {thoughts}");
await foreach (var token in chat.SendAsync("What is the square root of 144?"))
Console.Write(token);
Thinking budget levels
The Think property accepts a ThinkValue struct that also supports budget levels to control how much reasoning the model performs:
// Use predefined budget levels
var chat = new Chat(ollama) { Think = ThinkValue.High }; // maximum reasoning effort
var chat = new Chat(ollama) { Think = ThinkValue.Medium }; // balanced reasoning
var chat = new Chat(ollama) { Think = ThinkValue.Low }; // minimal reasoning
Note
Not all models support budget levels. See the Ollama release notes for supported models.
Events
The Chat class exposes events so you can monitor what happens during a conversation:
| Event | Argument | Fires when |
|---|---|---|
OnThink |
string |
The model emits thinking/reasoning tokens |
OnToolCall |
Message.ToolCall |
The model requests a tool invocation |
OnToolResult |
ToolResult |
A tool invocation has completed and produced a result |
var chat = new Chat(ollama);
chat.OnThink += (_, thoughts) => Console.Write($"[thinking] {thoughts}");
chat.OnToolCall += (_, call) => Console.WriteLine($"[calling tool] {call.Function?.Name}");
chat.OnToolResult += (_, result) => Console.WriteLine($"[tool result] {result.Result}");
Model-level options
Fine-tune inference parameters via the Options property:
var chat = new Chat(ollama)
{
Options = new RequestOptions
{
Temperature = 0.7f,
TopP = 0.9f,
NumCtx = 4096,
}
};
Using tools (function calling)
Many models support tools. See the Tool Support page for a detailed walkthrough. The short version is:
// Define a tool with the [OllamaTool] attribute (requires source generator)
public class MyTools
{
/// <summary>Gets the current weather for a city.</summary>
/// <param name="city">Name of the city</param>
[OllamaTool]
public static string GetWeather(string city) => $"Sunny and 22°C in {city}.";
}
// Pass tool instances alongside the message
var chat = new Chat(ollama);
await foreach (var token in chat.SendAsync("What's the weather in Berlin?", [new GetWeatherTool()]))
Console.Write(token);
Tool calls and their results are fed back into chat.Messages automatically.
GenerateAsync — single-turn completions
GenerateAsync maps directly to the /api/generate Ollama endpoint. Unlike Chat, it does not maintain history between calls — each call is self-contained.
Streaming a completion to the console
var ollama = new OllamaApiClient("http://localhost:11434", "qwen3.5:35b-a3b");
await foreach (var chunk in ollama.GenerateAsync("Why is the sky blue?"))
Console.Write(chunk.Response);
Providing context manually
If you need multi-turn behaviour without the Chat class you can pass the context tokens returned by a previous response:
GenerateDoneResponseStream? lastResponse = null;
await foreach (var chunk in ollama.GenerateAsync("Tell me a joke"))
{
Console.Write(chunk?.Response);
if (chunk is GenerateDoneResponseStream done)
lastResponse = done;
}
// Use the context from the previous turn
var request = new GenerateRequest
{
Prompt = "Explain why that was funny",
Context = lastResponse?.Context,
};
await foreach (var chunk in ollama.GenerateAsync(request))
Console.Write(chunk?.Response);
Tip
The Context property is only available on GenerateDoneResponseStream (the final chunk), not on every streamed chunk. Use pattern matching to capture it as shown above.
Generating with an image
var imageBytes = await File.ReadAllBytesAsync("chart.png");
var request = new GenerateRequest
{
Prompt = "Summarise this chart",
Images = [Convert.ToBase64String(imageBytes)],
};
await foreach (var chunk in ollama.GenerateAsync(request))
Console.Write(chunk?.Response);
Note
GenerateRequest.Images expects Base64-encoded strings, not raw byte arrays. Use Convert.ToBase64String() to convert your image bytes.
ChatAsync — low-level chat
ChatAsync maps directly to the /api/chat Ollama endpoint and gives full control over the request. The Chat class uses it internally. Prefer the Chat class unless you need precise control over the request.
var request = new ChatRequest
{
Model = "qwen3.5:35b-a3b",
Stream = true,
Messages =
[
new Message(ChatRole.System, "You are a concise assistant."),
new Message(ChatRole.User, "What is the capital of France?"),
],
};
await foreach (var chunk in ollama.ChatAsync(request))
Console.Write(chunk?.Message.Content);
Generating embeddings
Use EmbedAsync to produce vector embeddings for semantic search, clustering and similar tasks:
var ollama = new OllamaApiClient("http://localhost:11434", "nomic-embed-text");
var response = await ollama.EmbedAsync("The quick brown fox");
float[] vector = response.Embeddings[0];
Multiple inputs can be embedded in a single round-trip:
var response = await ollama.EmbedAsync(new EmbedRequest
{
Model = "nomic-embed-text",
Input = ["First sentence", "Second sentence"],
});