haive.core.models.llm.providers.nvidia¶

NVIDIA AI Endpoints Provider Module.

This module implements the NVIDIA AI Endpoints language model provider for the Haive framework, supporting NVIDIA’s optimized models through their AI Foundation API.

The provider handles API key management, model configuration, and safe imports of the langchain-nvidia-ai-endpoints package dependencies.

Examples

Basic usage:

from haive.core.models.llm.providers.nvidia import NVIDIAProvider

provider = NVIDIAProvider(
    model="meta/llama3-70b-instruct",
    temperature=0.7,
    max_tokens=1000
)
llm = provider.instantiate()

With streaming:

provider = NVIDIAProvider(
    model="microsoft/phi-3-medium-4k-instruct",
    temperature=0.1,
    stream=True
)

NVIDIAProvider(*[, requests_per_second, ...])

NVIDIA AI Endpoints language model provider configuration.

Classes¶

NVIDIAProvider

NVIDIA AI Endpoints language model provider configuration.

Module Contents¶

class haive.core.models.llm.providers.nvidia.NVIDIAProvider(/, **data)[source]¶

Bases: haive.core.models.llm.providers.base.BaseLLMProvider

NVIDIA AI Endpoints language model provider configuration.

This provider supports NVIDIA’s optimized models including Llama, Mixtral, and other high-performance models through NVIDIA’s AI Foundation API.

Parameters:

data (Any)
requests_per_second (float | None)
tokens_per_second (int | None)
tokens_per_minute (int | None)
max_retries (int)
retry_delay (float)
check_every_n_seconds (float | None)
burst_size (int | None)
provider (LLMProvider)
model (str | None)
name (str | None)
api_key (SecretStr)
cache_enabled (bool)
cache_ttl (int | None)
extra_params (dict[str, Any] | None)
debug (bool)
temperature (float | None)
max_tokens (int | None)
top_p (float | None)
stream (bool)
stop (list[str] | None)

provider¶

Always LLMProvider.NVIDIA

Type:: LLMProvider

model¶

The NVIDIA model to use

Type:: str

temperature¶

Sampling temperature (0.0-1.0)

Type:: float

max_tokens¶

Maximum tokens in response

Type:: int

top_p¶

Nucleus sampling parameter

Type:: float

stream¶

Enable streaming responses

Type:: bool

stop¶

Stop sequences for generation

Type:: list

Examples

Llama 3 for reasoning:

provider = NVIDIAProvider(
    model="meta/llama3-70b-instruct",
    temperature=0.3,
    max_tokens=2000
)

Mixtral for fast inference:

provider = NVIDIAProvider(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    temperature=0.7,
    stream=True
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod get_models()[source]¶

Get available NVIDIA models.

Return type:: list[str]

max_tokens: int | None = None¶: Get maximum total tokens for this model.