haive.core.models.llm.providers.ollama

Ollama Provider Module.

This module implements the Ollama language model provider for the Haive framework, supporting local LLM deployment through Ollama’s model serving infrastructure.

Ollama enables running open-source LLMs locally without requiring API keys or external services, making it ideal for privacy-sensitive applications and offline deployments.

Examples

Basic usage:

from haive.core.models.llm.providers.ollama import OllamaProvider

provider = OllamaProvider(
    model="llama3",
    temperature=0.7
)
llm = provider.instantiate()

With custom server:

provider = OllamaProvider(
    model="mixtral",
    base_url="http://gpu-server:11434",
    num_gpu=2
)

OllamaProvider(*[, requests_per_second, ...])

Ollama local language model provider configuration.

Classes

OllamaProvider

Ollama local language model provider configuration.

Module Contents

class haive.core.models.llm.providers.ollama.OllamaProvider(/, **data)[source]

Bases: haive.core.models.llm.providers.base.BaseLLMProvider

Ollama local language model provider configuration.

This provider supports running open-source LLMs locally through Ollama, including Llama 3, Mistral, Mixtral, and many other models. It requires a running Ollama server but no API keys.

Parameters:
  • data (Any)

  • requests_per_second (float | None)

  • tokens_per_second (int | None)

  • tokens_per_minute (int | None)

  • max_retries (int)

  • retry_delay (float)

  • check_every_n_seconds (float | None)

  • burst_size (int | None)

  • provider (LLMProvider)

  • model (str | None)

  • name (str | None)

  • api_key (SecretStr)

  • cache_enabled (bool)

  • cache_ttl (int | None)

  • extra_params (dict[str, Any] | None)

  • debug (bool)

  • base_url (str)

  • temperature (float | None)

  • num_predict (int | None)

  • top_p (float | None)

  • top_k (int | None)

  • repeat_penalty (float | None)

  • seed (int | None)

  • num_gpu (int | None)

  • num_thread (int | None)

provider

Always LLMProvider.OLLAMA

model

Model name (default: “llama3”)

base_url

Ollama server URL (default: “http://localhost:11434”)

temperature

Sampling temperature (0-1)

num_predict

Maximum tokens to generate

top_p

Nucleus sampling parameter

top_k

Top-k sampling parameter

repeat_penalty

Repetition penalty

seed

Random seed for reproducibility

num_gpu

Number of GPUs to use

num_thread

Number of CPU threads

Environment Variables:

OLLAMA_BASE_URL: Server URL (default: http://localhost:11434) OLLAMA_NUM_GPU: Default number of GPUs to use

Popular Models:
  • llama3: Meta’s Llama 3 (8B, 70B)

  • mistral: Mistral 7B

  • mixtral: Mixtral 8x7B MoE

  • codellama: Code-specialized Llama

  • phi3: Microsoft’s Phi-3

  • gemma: Google’s Gemma

  • qwen: Alibaba’s Qwen

Examples

Running Llama 3 locally:

provider = OllamaProvider(
    model="llama3:70b",
    temperature=0.7,
    num_predict=2048
)
llm = provider.instantiate()

Using a remote Ollama server:

provider = OllamaProvider(
    model="mixtral:8x7b",
    base_url="http://192.168.1.100:11434",
    num_gpu=2,
    temperature=0.5
)

With specific hardware settings:

provider = OllamaProvider(
    model="codellama:34b",
    num_gpu=1,
    num_thread=8,
    repeat_penalty=1.1
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod get_models()[source]

Get available Ollama models.

This attempts to connect to the local Ollama server and list installed models. If the server is not running, returns a list of popular models.

Returns:

List of available model names

Return type:

list[str]