Ollama Provider
The Ollama provider implements the genai.Provider interface for Ollama, an open-source tool for running LLMs locally. It works by wrapping the OpenAI provider, since Ollama (v0.1.14+) exposes an OpenAI-compatible API at /v1/chat/completions.
Quick Start
import (
"context"
"fmt"
"oss.nandlabs.io/golly/genai"
"oss.nandlabs.io/golly/genai/impl"
)
provider := impl.NewOllamaProvider(nil)
defer provider.Close()
msg := genai.NewTextMessage(genai.RoleUser, "Hello! What is Go?")
resp, err := provider.Generate(context.Background(), "llama3", msg, nil)
if err != nil {
panic(err)
}
for _, c := range resp.Candidates {
for _, p := range c.Message.Parts {
if p.Text != nil {
fmt.Println(p.Text.Text)
}
}
}Prerequisites
- Install Ollama: https://ollama.com/download
- Pull a model:
ollama pull llama3 - Start Ollama (if not running as a service):
ollama serve
The default endpoint is http://localhost:11434.
Configuration
Simple Constructor
// Connects to http://localhost:11434/v1 with no authentication
provider := impl.NewOllamaProvider(nil)Full Configuration
import (
"oss.nandlabs.io/golly/clients"
"oss.nandlabs.io/golly/genai/impl"
"oss.nandlabs.io/golly/rest"
)
provider := impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: nil, // no auth for local
BaseURL: "http://localhost:11434/v1",
Models: []string{"llama3", "mistral", "codellama", "llava"},
Description: "Local Ollama instance",
Version: "1.0.0",
ExtraHeaders: map[string]string{
"X-Request-Source": "dev-machine",
},
}, &rest.ClientOpts{
// Optional: configure timeouts for long-running local inference
})Config Reference
| Field | Type | Default | Description |
|---|---|---|---|
Auth | clients.AuthProvider | nil | Authentication provider (nil for local, set for proxied) |
BaseURL | string | http://localhost:11434/v1 | Ollama OpenAI-compatible API URL |
Models | []string | nil | List of available model IDs (informational) |
Description | string | "Ollama provider for local model inference..." | Provider description |
Version | string | "1.0.0" | Provider version |
ExtraHeaders | map[string]string | nil | Additional HTTP headers on every request |
Authentication
Local (No Auth)
The default โ no credentials needed:
provider := impl.NewOllamaProvider(nil)Behind a Reverse Proxy
When Ollama is deployed behind nginx, Caddy, or another proxy with authentication:
// HTTP Basic Auth proxy
provider := impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: clients.NewBasicAuth("username", "password"),
BaseURL: "https://ollama.internal.company.com/v1",
}, nil)
// Bearer token proxy
provider = impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: clients.NewBearerAuth("my-proxy-token"),
BaseURL: "https://ollama.internal.company.com/v1",
}, nil)Behind an API Gateway
When Ollama is fronted by Kong, AWS API Gateway, or similar:
// API key gateway
provider := impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: clients.NewAPIKeyAuth("X-API-Key", "my-gateway-key"),
BaseURL: "https://api.company.com/ollama/v1",
}, nil)
// OAuth2 gateway
oauth := rest.NewOAuth2Provider(
"https://auth.company.com/oauth/token",
"client-id", "client-secret",
"openid", // scopes
)
provider = impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: oauth,
BaseURL: "https://api.company.com/ollama/v1",
}, nil)Supported Options
Since Ollama uses the OpenAI-compatible API, it supports the same options as the OpenAI provider. However, actual support depends on the model:
| GenAI Option | Parameter | Type | Notes |
|---|---|---|---|
OptionMaxTokens | max_tokens | int | Maximum tokens in the response |
OptionTemperature | temperature | float32 | Sampling temperature |
OptionTopP | top_p | float32 | Nucleus sampling |
OptionSeed | seed | int | Deterministic output |
OptionStopWords | stop | []string | Stop sequences |
OptionFrequencyPenalty | frequency_penalty | float64 | Penalise frequent tokens |
OptionPresencePenalty | presence_penalty | float64 | Penalise repeated tokens |
OptionSystemInstructions | messages[0] (system) | string | Prepended as system message |
llava support image inputs, while text-only models like llama3 do not.Generating Responses
Basic Generation
msg := genai.NewTextMessage(genai.RoleUser, "Write a haiku about Go programming.")
opts := genai.NewOptionsBuilder().
SetMaxTokens(256).
SetTemperature(0.8).
Build()
resp, err := provider.Generate(ctx, "llama3", msg, opts)System Instructions
opts := genai.NewOptionsBuilder().
SetMaxTokens(1024).
Build()
opts.Set(genai.OptionSystemInstructions, "You are a Linux sysadmin. Respond with shell commands and brief explanations.")
msg := genai.NewTextMessage(genai.RoleUser, "How do I find large files on disk?")
resp, err := provider.Generate(ctx, "llama3", msg, opts)Streaming
msg := genai.NewTextMessage(genai.RoleUser, "Explain the differences between goroutines and threads.")
opts := genai.NewOptionsBuilder().SetMaxTokens(2048).Build()
respCh, errCh := provider.GenerateStream(ctx, "llama3", msg, opts)
for resp := range respCh {
for _, c := range resp.Candidates {
for _, p := range c.Message.Parts {
if p.Text != nil {
fmt.Print(p.Text.Text) // prints token by token
}
}
}
}
if err := <-errCh; err != nil {
// handle streaming error
}Multi-Modal (Vision)
Use a vision-capable model like llava:
msg := genai.NewTextMessage(genai.RoleUser, "What's in this image?")
genai.AddBinPart(msg, "photo", jpegBytes, "image/jpeg")
resp, err := provider.Generate(ctx, "llava", msg, nil)Architecture
โโโโโโโโโโโโโโโโโโโโโโโโ
โ OllamaProvider โ wraps
โ โโโโโโโโโโบ OpenAIProvider
โ Name() = "ollama" โ (full implementation)
โโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ HTTP POST to /v1/chat/completions
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Ollama Server โ
โ localhost:11434 โ
โ โ
โ โโโโโโโโโโโโโโโโโโ โ
โ โ llama3 โ โ
โ โ mistral โ โ
โ โ codellama โ โ
โ โ llava โ โ
โ โโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโThe OllamaProvider embeds OpenAIProvider and overrides only Name(). All Generate and GenerateStream calls are delegated to the OpenAI implementation, which makes requests to Ollama’s OpenAI-compatible endpoint.
Popular Models
| Model | Size | Use Case | Vision |
|---|---|---|---|
llama3 | 8B | General purpose | No |
llama3:70b | 70B | High quality, slower | No |
mistral | 7B | Fast, good quality | No |
codellama | 7Bโ34B | Code generation | No |
llava | 7Bโ13B | Vision + text | Yes |
phi3 | 3.8B | Small, fast | No |
gemma2 | 9B/27B | Google’s open model | No |
deepseek-coder | 6.7Bโ33B | Code generation | No |
qwen2 | 7Bโ72B | Multilingual | No |
Pull models with:
ollama pull llama3
ollama pull llava
ollama pull codellamaError Handling
resp, err := provider.Generate(ctx, "llama3", msg, opts)
if err != nil {
// Common errors:
// - "openai API request failed: ..." โ Ollama not running or unreachable
// - "openai API error [...]" โ model not found or invalid request
log.Fatal(err)
}Deployment Patterns
Local Development
provider := impl.NewOllamaProvider(nil)Docker
# docker-compose.yml
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollamaprovider := impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
BaseURL: "http://ollama:11434/v1", // Docker service name
}, nil)Kubernetes with Auth Sidecar
provider := impl.NewOllamaProviderWithConfig(&impl.OllamaProviderConfig{
Auth: clients.NewBearerAuth(os.Getenv("OLLAMA_TOKEN")),
BaseURL: os.Getenv("OLLAMA_URL"),
}, nil)