haive.core.models.llm.providers.ollama¶
Ollama Provider Module.
This module implements the Ollama language model provider for the Haive framework, supporting local LLM deployment through Ollama’s model serving infrastructure.
Ollama enables running open-source LLMs locally without requiring API keys or external services, making it ideal for privacy-sensitive applications and offline deployments.
Examples
Basic usage:
from haive.core.models.llm.providers.ollama import OllamaProvider
provider = OllamaProvider(
model="llama3",
temperature=0.7
)
llm = provider.instantiate()
With custom server:
provider = OllamaProvider(
model="mixtral",
base_url="http://gpu-server:11434",
num_gpu=2
)
|
Ollama local language model provider configuration. |
Classes¶
Ollama local language model provider configuration. |
Module Contents¶
- class haive.core.models.llm.providers.ollama.OllamaProvider(/, **data)[source]¶
Bases:
haive.core.models.llm.providers.base.BaseLLMProvider
Ollama local language model provider configuration.
This provider supports running open-source LLMs locally through Ollama, including Llama 3, Mistral, Mixtral, and many other models. It requires a running Ollama server but no API keys.
- Parameters:
data (Any)
requests_per_second (float | None)
tokens_per_second (int | None)
tokens_per_minute (int | None)
max_retries (int)
retry_delay (float)
check_every_n_seconds (float | None)
burst_size (int | None)
provider (LLMProvider)
model (str | None)
name (str | None)
api_key (SecretStr)
cache_enabled (bool)
cache_ttl (int | None)
debug (bool)
base_url (str)
temperature (float | None)
num_predict (int | None)
top_p (float | None)
top_k (int | None)
repeat_penalty (float | None)
seed (int | None)
num_gpu (int | None)
num_thread (int | None)
- provider¶
Always LLMProvider.OLLAMA
- model¶
Model name (default: “llama3”)
- base_url¶
Ollama server URL (default: “http://localhost:11434”)
- temperature¶
Sampling temperature (0-1)
- num_predict¶
Maximum tokens to generate
- top_p¶
Nucleus sampling parameter
- top_k¶
Top-k sampling parameter
- repeat_penalty¶
Repetition penalty
- seed¶
Random seed for reproducibility
- num_gpu¶
Number of GPUs to use
- num_thread¶
Number of CPU threads
- Environment Variables:
OLLAMA_BASE_URL: Server URL (default: http://localhost:11434) OLLAMA_NUM_GPU: Default number of GPUs to use
- Popular Models:
llama3: Meta’s Llama 3 (8B, 70B)
mistral: Mistral 7B
mixtral: Mixtral 8x7B MoE
codellama: Code-specialized Llama
phi3: Microsoft’s Phi-3
gemma: Google’s Gemma
qwen: Alibaba’s Qwen
Examples
Running Llama 3 locally:
provider = OllamaProvider( model="llama3:70b", temperature=0.7, num_predict=2048 ) llm = provider.instantiate()
Using a remote Ollama server:
provider = OllamaProvider( model="mixtral:8x7b", base_url="http://192.168.1.100:11434", num_gpu=2, temperature=0.5 )
With specific hardware settings:
provider = OllamaProvider( model="codellama:34b", num_gpu=1, num_thread=8, repeat_penalty=1.1 )
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.