haive.core.models.embeddings.baseΒΆ

Base Embedding Models Module.

from typing import Any This module provides the foundational abstractions for embedding models in the Haive framework. It includes base classes and implementations for different embedding providers that transform text into high-dimensional vector representations for use in semantic search, clustering, and other NLP tasks.

Typical usage example:

Examples

>>> from haive.core.models.embeddings.base import create_embeddings, HuggingFaceEmbeddingConfig
>>>
>>> # Create a HuggingFace embedding model configuration
>>> config = HuggingFaceEmbeddingConfig(
>>> model="sentence-transformers/all-MiniLM-L6-v2"
>>> )
>>>
>>> # Instantiate the embeddings model
>>> embeddings = create_embeddings(config)
>>>
>>> # Use the model to embed documents or queries
>>> doc_embeddings = embeddings.embed_documents(["Text to embed"])

ClassesΒΆ

AnyscaleEmbeddingConfig

Configuration for Anyscale embedding models.

AzureEmbeddingConfig

Configuration for Azure OpenAI embedding models.

BaseEmbeddingConfig

Base configuration for embedding models.

BedrockEmbeddingConfig

Configuration for AWS Bedrock embedding models.

CloudflareEmbeddingConfig

Configuration for Cloudflare Workers AI embedding models.

CohereEmbeddingConfig

Configuration for Cohere embedding models.

FastEmbedEmbeddingConfig

Configuration for FastEmbed embedding models.

HuggingFaceEmbeddingConfig

Configuration for HuggingFace embedding models.

JinaEmbeddingConfig

Configuration for Jina AI embedding models.

LlamaCppEmbeddingConfig

Configuration for LlamaCpp local embedding models.

MockTorch

Mock torch module for documentation builds.

OllamaEmbeddingConfig

Configuration for Ollama embedding models.

OpenAIEmbeddingConfig

Configuration for OpenAI embedding models.

SecureConfigMixin

Mixin for securely handling API keys from environment variables.

SentenceTransformerEmbeddingConfig

Configuration for SentenceTransformer embedding models.

VertexAIEmbeddingConfig

Configuration for Google Vertex AI embedding models.

VertexAIEmbeddings

Mock VertexAI embeddings to avoid slow imports.

VoyageAIEmbeddingConfig

Configuration for Voyage AI embedding models.

FunctionsΒΆ

create_embeddings(config)

Factory function to create embedding models from a configuration.

Module ContentsΒΆ

class haive.core.models.embeddings.base.AnyscaleEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Anyscale embedding models.

This class configures embedding models from Anyscale.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.ANYSCALE

modelΒΆ

The model name (defaults to thenlper/gte-large)

base_urlΒΆ

The base URL for the Anyscale API

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate an Anyscale embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to AnyscaleEmbeddings

Returns:

The instantiated embedding model

Return type:

AnyscaleEmbeddings

class haive.core.models.embeddings.base.AzureEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Azure OpenAI embedding models.

This class configures embedding models from Azure OpenAI services, supporting environment variable resolution for credentials.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.AZURE

modelΒΆ

The Azure deployment name for the embedding model

api_versionΒΆ

The Azure OpenAI API version to use

api_baseΒΆ

The Azure endpoint URL

api_typeΒΆ

The API type (typically β€œazure”)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

get_api_key()[source]ΒΆ

Get the API key as a string.

Returns:

The API key

Return type:

str

instantiate(**kwargs)[source]ΒΆ

Instantiate an Azure OpenAI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to AzureOpenAIEmbeddings

Returns:

The instantiated embedding model

Return type:

AzureOpenAIEmbeddings

class haive.core.models.embeddings.base.BaseEmbeddingConfig(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel, SecureConfigMixin

Base configuration for embedding models.

This abstract base class defines the common interface for all embedding model configurations, ensuring consistent instantiation patterns across providers.

Parameters:

data (Any)

providerΒΆ

The embedding provider (e.g., Azure, HuggingFace)

modelΒΆ

The specific model identifier or name

api_keyΒΆ

The API key for the provider (if required)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

abstractmethod instantiate(**kwargs)[source]ΒΆ

Instantiate the embedding model with the configuration.

Parameters:

**kwargs – Additional keyword arguments to pass to the model constructor

Returns:

The instantiated embedding model

Return type:

Any

Raises:

NotImplementedError – Must be implemented by subclasses

class haive.core.models.embeddings.base.BedrockEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for AWS Bedrock embedding models.

This class configures embedding models from AWS Bedrock service.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.BEDROCK

modelΒΆ

The model ID (defaults to amazon.titan-embed-text-v1)

regionΒΆ

AWS region

credentials_profile_nameΒΆ

AWS credentials profile name

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate an AWS Bedrock embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to BedrockEmbeddings

Returns:

The instantiated embedding model

Return type:

BedrockEmbeddings

class haive.core.models.embeddings.base.CloudflareEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Cloudflare Workers AI embedding models.

This class configures embedding models from Cloudflare Workers AI.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.CLOUDFLARE

modelΒΆ

The model name (defaults to @cf/baai/bge-small-en-v1.5)

account_idΒΆ

Cloudflare account ID

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a Cloudflare Workers AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to CloudflareWorkersAIEmbeddings

Returns:

The instantiated embedding model

Return type:

CloudflareWorkersAIEmbeddings

class haive.core.models.embeddings.base.CohereEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Cohere embedding models.

This class configures embedding models from Cohere services.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.COHERE

modelΒΆ

The Cohere model name for embeddings (defaults to embed-english-v3.0)

input_typeΒΆ

Type of input to be embedded (defaults to search_document)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a Cohere embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to CohereEmbeddings

Returns:

The instantiated embedding model

Return type:

CohereEmbeddings

class haive.core.models.embeddings.base.FastEmbedEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for FastEmbed embedding models.

This class configures FastEmbed models, which are lightweight and efficient embeddings that can run on CPU.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.FASTEMBED

modelΒΆ

The model name (defaults to BAAI/bge-small-en-v1.5)

max_lengthΒΆ

Maximum sequence length

cache_folderΒΆ

Where to cache the model files

use_cacheΒΆ

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a FastEmbed embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to FastEmbedEmbeddings

Returns:

The instantiated embedding model

Return type:

FastEmbedEmbeddings

class haive.core.models.embeddings.base.HuggingFaceEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for HuggingFace embedding models.

This class configures embedding models from HuggingFace’s model hub, with support for local caching and hardware acceleration.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.HUGGINGFACE

modelΒΆ

The HuggingFace model ID (defaults to all-MiniLM-L6-v2)

model_kwargsΒΆ

Additional keyword arguments for model instantiation

encode_kwargsΒΆ

Additional keyword arguments for encoding

query_encode_kwargsΒΆ

Additional keyword arguments for query encoding

multi_processΒΆ

Whether to use multi-processing for encoding

cache_folderΒΆ

Where to cache the model files

show_progressΒΆ

Whether to show progress bars

use_cacheΒΆ

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a HuggingFace embedding model.

This method includes error handling and GPU memory cleanup in case of initialization failures.

Parameters:

**kwargs – Additional keyword arguments to pass to HuggingFaceEmbeddings

Returns:

The instantiated embedding model

Return type:

HuggingFaceEmbeddings

Raises:

Exception – If model instantiation fails after cleanup attempt

class haive.core.models.embeddings.base.JinaEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Jina AI embedding models.

This class configures embedding models from Jina AI.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.JINA

modelΒΆ

The model name (defaults to jina-embeddings-v2-base-en)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a Jina AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to JinaEmbeddings

Returns:

The instantiated embedding model

Return type:

JinaEmbeddings

class haive.core.models.embeddings.base.LlamaCppEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for LlamaCpp local embedding models.

This class configures embedding models using LlamaCpp for local execution.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.LLAMACPP

modelΒΆ

Required model name parameter (for compatibility with BaseEmbeddingConfig)

model_pathΒΆ

Path to the model file

n_ctxΒΆ

Context size for the model

n_batchΒΆ

Batch size for inference

n_gpu_layersΒΆ

Number of layers to offload to GPU

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a LlamaCpp embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to LlamaCppEmbeddings

Returns:

The instantiated embedding model

Return type:

LlamaCppEmbeddings

class haive.core.models.embeddings.base.MockTorch[source]ΒΆ

Mock torch module for documentation builds.

class haive.core.models.embeddings.base.OllamaEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Ollama embedding models.

This class configures embedding models from Ollama, which runs locally and doesn’t require an API key.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.OLLAMA

modelΒΆ

The Ollama model name (defaults to nomic-embed-text)

base_urlΒΆ

The base URL for the Ollama server

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate an Ollama embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to OllamaEmbeddings

Returns:

The instantiated embedding model

Return type:

OllamaEmbeddings

class haive.core.models.embeddings.base.OpenAIEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for OpenAI embedding models.

This class configures embedding models from OpenAI services, supporting multiple model types and configurations.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.OPENAI

modelΒΆ

The OpenAI model name for embeddings (defaults to text-embedding-3-small)

dimensionsΒΆ

Output dimensions for the embedding vectors

show_progress_barΒΆ

Whether to show progress bars during embedding

chunk_sizeΒΆ

Batch size for embedding operations

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate an OpenAI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to OpenAIEmbeddings

Returns:

The instantiated embedding model

Return type:

OpenAIEmbeddings

class haive.core.models.embeddings.base.SecureConfigMixin[source]ΒΆ

Mixin for securely handling API keys from environment variables.

This mixin provides methods for securely resolving API keys from environment variables or explicitly provided values, with appropriate fallbacks.

classmethod resolve_api_key(v, info)[source]ΒΆ

Resolve API key from provided value or environment variables.

Parameters:
  • v – The provided API key value

  • info (pydantic.ValidationInfo) – ValidationInfo containing field data

Returns:

The resolved API key as a SecretStr

Return type:

SecretStr

class haive.core.models.embeddings.base.SentenceTransformerEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for SentenceTransformer embedding models.

This class configures embedding models from SentenceTransformers library, which provides efficient and accurate sentence and text embeddings.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.SENTENCE_TRANSFORMERS

modelΒΆ

The model name or path (defaults to all-MiniLM-L6-v2)

cache_folderΒΆ

Where to cache the model files

use_cacheΒΆ

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a SentenceTransformer embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to SentenceTransformerEmbeddings

Returns:

The instantiated embedding model

Return type:

SentenceTransformerEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Google Vertex AI embedding models.

This class configures embedding models from Google Vertex AI.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.VERTEXAI

modelΒΆ

The model name (defaults to textembedding-gecko@latest)

projectΒΆ

Google Cloud project ID

locationΒΆ

Google Cloud region

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a Google Vertex AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to VertexAIEmbeddings

Returns:

The instantiated embedding model

Return type:

VertexAIEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddings(*args, **kwargs)[source]ΒΆ

Mock VertexAI embeddings to avoid slow imports.

class haive.core.models.embeddings.base.VoyageAIEmbeddingConfig(/, **data)[source]ΒΆ

Bases: BaseEmbeddingConfig

Configuration for Voyage AI embedding models.

This class configures embedding models from Voyage AI.

Parameters:

data (Any)

providerΒΆ

Set to EmbeddingProvider.VOYAGEAI

modelΒΆ

The model name (defaults to voyage-2)

voyage_api_urlΒΆ

The API URL for Voyage AI

voyage_api_versionΒΆ

The API version for Voyage AI

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]ΒΆ

Instantiate a Voyage AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to VoyageEmbeddings

Returns:

The instantiated embedding model

Return type:

VoyageEmbeddings

haive.core.models.embeddings.base.create_embeddings(config)[source]ΒΆ

Factory function to create embedding models from a configuration.

This function simplifies the instantiation of embedding models by delegating to the appropriate configuration class.

Parameters:

config (BaseEmbeddingConfig) – The embedding model configuration

Returns:

The instantiated embedding model

Return type:

Any

Example:

Examples

>>> config = HuggingFaceEmbeddingConfig(model="sentence-transformers/all-mpnet-base-v2")
>>> embeddings = create_embeddings(config)