haive.core.models.llm.providers.nvidia

NVIDIA AI Endpoints Provider Module.

This module implements the NVIDIA AI Endpoints language model provider for the Haive framework, supporting NVIDIA’s optimized models through their AI Foundation API.

The provider handles API key management, model configuration, and safe imports of the langchain-nvidia-ai-endpoints package dependencies.

Examples

Basic usage:

from haive.core.models.llm.providers.nvidia import NVIDIAProvider

provider = NVIDIAProvider(
    model="meta/llama3-70b-instruct",
    temperature=0.7,
    max_tokens=1000
)
llm = provider.instantiate()

With streaming:

provider = NVIDIAProvider(
    model="microsoft/phi-3-medium-4k-instruct",
    temperature=0.1,
    stream=True
)

NVIDIAProvider(*[, requests_per_second, ...])

NVIDIA AI Endpoints language model provider configuration.

Classes

NVIDIAProvider

NVIDIA AI Endpoints language model provider configuration.

Module Contents

class haive.core.models.llm.providers.nvidia.NVIDIAProvider(/, **data)[source]

Bases: haive.core.models.llm.providers.base.BaseLLMProvider

NVIDIA AI Endpoints language model provider configuration.

This provider supports NVIDIA’s optimized models including Llama, Mixtral, and other high-performance models through NVIDIA’s AI Foundation API.

Parameters:
  • data (Any)

  • requests_per_second (float | None)

  • tokens_per_second (int | None)

  • tokens_per_minute (int | None)

  • max_retries (int)

  • retry_delay (float)

  • check_every_n_seconds (float | None)

  • burst_size (int | None)

  • provider (LLMProvider)

  • model (str | None)

  • name (str | None)

  • api_key (SecretStr)

  • cache_enabled (bool)

  • cache_ttl (int | None)

  • extra_params (dict[str, Any] | None)

  • debug (bool)

  • temperature (float | None)

  • max_tokens (int | None)

  • top_p (float | None)

  • stream (bool)

  • stop (list[str] | None)

provider

Always LLMProvider.NVIDIA

Type:

LLMProvider

model

The NVIDIA model to use

Type:

str

temperature

Sampling temperature (0.0-1.0)

Type:

float

max_tokens

Maximum tokens in response

Type:

int

top_p

Nucleus sampling parameter

Type:

float

stream

Enable streaming responses

Type:

bool

stop

Stop sequences for generation

Type:

list

Examples

Llama 3 for reasoning:

provider = NVIDIAProvider(
    model="meta/llama3-70b-instruct",
    temperature=0.3,
    max_tokens=2000
)

Mixtral for fast inference:

provider = NVIDIAProvider(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    temperature=0.7,
    stream=True
)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod get_models()[source]

Get available NVIDIA models.

Return type:

list[str]

max_tokens: int | None = None

Get maximum total tokens for this model.