haive.core.models.llm.providers.ollama¶

Ollama Provider Module.

This module implements the Ollama language model provider for the Haive framework, supporting local LLM deployment through Ollama’s model serving infrastructure.

Ollama enables running open-source LLMs locally without requiring API keys or external services, making it ideal for privacy-sensitive applications and offline deployments.

Examples

Basic usage:

from haive.core.models.llm.providers.ollama import OllamaProvider

provider = OllamaProvider(
    model="llama3",
    temperature=0.7
)
llm = provider.instantiate()

With custom server:

provider = OllamaProvider(
    model="mixtral",
    base_url="http://gpu-server:11434",
    num_gpu=2
)

OllamaProvider(*[, requests_per_second, ...])

Ollama local language model provider configuration.

Classes¶

OllamaProvider

Ollama local language model provider configuration.

Module Contents¶

class haive.core.models.llm.providers.ollama.OllamaProvider(/, **data)[source]¶

Bases: haive.core.models.llm.providers.base.BaseLLMProvider

Ollama local language model provider configuration.

This provider supports running open-source LLMs locally through Ollama, including Llama 3, Mistral, Mixtral, and many other models. It requires a running Ollama server but no API keys.

Parameters:

data (Any)
requests_per_second (float | None)
tokens_per_second (int | None)
tokens_per_minute (int | None)
max_retries (int)
retry_delay (float)
check_every_n_seconds (float | None)
burst_size (int | None)
provider (LLMProvider)
model (str | None)
name (str | None)
api_key (SecretStr)
cache_enabled (bool)
cache_ttl (int | None)
extra_params (dict[str, Any] | None)
debug (bool)
base_url (str)
temperature (float | None)
num_predict (int | None)
top_p (float | None)
top_k (int | None)
repeat_penalty (float | None)
seed (int | None)
num_gpu (int | None)
num_thread (int | None)

provider¶: Always LLMProvider.OLLAMA

model¶: Model name (default: “llama3”)

base_url¶: Ollama server URL (default: “http://localhost:11434”)

temperature¶: Sampling temperature (0-1)

num_predict¶: Maximum tokens to generate

top_p¶: Nucleus sampling parameter

top_k¶: Top-k sampling parameter

repeat_penalty¶: Repetition penalty

seed¶: Random seed for reproducibility

num_gpu¶: Number of GPUs to use

num_thread¶: Number of CPU threads

Environment Variables:

OLLAMA_BASE_URL: Server URL (default: http://localhost:11434) OLLAMA_NUM_GPU: Default number of GPUs to use

Popular Models:

llama3: Meta’s Llama 3 (8B, 70B)
mistral: Mistral 7B
mixtral: Mixtral 8x7B MoE
codellama: Code-specialized Llama
phi3: Microsoft’s Phi-3
gemma: Google’s Gemma
qwen: Alibaba’s Qwen

Examples

Running Llama 3 locally:

provider = OllamaProvider(
    model="llama3:70b",
    temperature=0.7,
    num_predict=2048
)
llm = provider.instantiate()

Using a remote Ollama server:

provider = OllamaProvider(
    model="mixtral:8x7b",
    base_url="http://192.168.1.100:11434",
    num_gpu=2,
    temperature=0.5
)

With specific hardware settings:

provider = OllamaProvider(
    model="codellama:34b",
    num_gpu=1,
    num_thread=8,
    repeat_penalty=1.1
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod get_models()[source]¶

Get available Ollama models.

This attempts to connect to the local Ollama server and list installed models. If the server is not running, returns a list of popular models.

Returns:: List of available model names
Return type:: list[str]