haive.core.models.embeddings.baseΒΆ
Base Embedding Models Module.
from typing import Any This module provides the foundational abstractions for embedding models in the Haive framework. It includes base classes and implementations for different embedding providers that transform text into high-dimensional vector representations for use in semantic search, clustering, and other NLP tasks.
Typical usage example:
Examples
>>> from haive.core.models.embeddings.base import create_embeddings, HuggingFaceEmbeddingConfig
>>>
>>> # Create a HuggingFace embedding model configuration
>>> config = HuggingFaceEmbeddingConfig(
>>> model="sentence-transformers/all-MiniLM-L6-v2"
>>> )
>>>
>>> # Instantiate the embeddings model
>>> embeddings = create_embeddings(config)
>>>
>>> # Use the model to embed documents or queries
>>> doc_embeddings = embeddings.embed_documents(["Text to embed"])
ClassesΒΆ
Configuration for Anyscale embedding models. |
|
Configuration for Azure OpenAI embedding models. |
|
Base configuration for embedding models. |
|
Configuration for AWS Bedrock embedding models. |
|
Configuration for Cloudflare Workers AI embedding models. |
|
Configuration for Cohere embedding models. |
|
Configuration for FastEmbed embedding models. |
|
Configuration for HuggingFace embedding models. |
|
Configuration for Jina AI embedding models. |
|
Configuration for LlamaCpp local embedding models. |
|
Mock torch module for documentation builds. |
|
Configuration for Ollama embedding models. |
|
Configuration for OpenAI embedding models. |
|
Mixin for securely handling API keys from environment variables. |
|
Configuration for SentenceTransformer embedding models. |
|
Configuration for Google Vertex AI embedding models. |
|
Mock VertexAI embeddings to avoid slow imports. |
|
Configuration for Voyage AI embedding models. |
FunctionsΒΆ
|
Factory function to create embedding models from a configuration. |
Module ContentsΒΆ
- class haive.core.models.embeddings.base.AnyscaleEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Anyscale embedding models.
This class configures embedding models from Anyscale.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.ANYSCALE
- modelΒΆ
The model name (defaults to thenlper/gte-large)
- base_urlΒΆ
The base URL for the Anyscale API
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.AzureEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Azure OpenAI embedding models.
This class configures embedding models from Azure OpenAI services, supporting environment variable resolution for credentials.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.AZURE
- modelΒΆ
The Azure deployment name for the embedding model
- api_versionΒΆ
The Azure OpenAI API version to use
- api_baseΒΆ
The Azure endpoint URL
- api_typeΒΆ
The API type (typically βazureβ)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.BaseEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
pydantic.BaseModel
,SecureConfigMixin
Base configuration for embedding models.
This abstract base class defines the common interface for all embedding model configurations, ensuring consistent instantiation patterns across providers.
- Parameters:
data (Any)
- providerΒΆ
The embedding provider (e.g., Azure, HuggingFace)
- modelΒΆ
The specific model identifier or name
- api_keyΒΆ
The API key for the provider (if required)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- abstractmethod instantiate(**kwargs)[source]ΒΆ
Instantiate the embedding model with the configuration.
- Parameters:
**kwargs β Additional keyword arguments to pass to the model constructor
- Returns:
The instantiated embedding model
- Return type:
Any
- Raises:
NotImplementedError β Must be implemented by subclasses
- class haive.core.models.embeddings.base.BedrockEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for AWS Bedrock embedding models.
This class configures embedding models from AWS Bedrock service.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.BEDROCK
- modelΒΆ
The model ID (defaults to amazon.titan-embed-text-v1)
- regionΒΆ
AWS region
- credentials_profile_nameΒΆ
AWS credentials profile name
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.CloudflareEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Cloudflare Workers AI embedding models.
This class configures embedding models from Cloudflare Workers AI.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.CLOUDFLARE
- modelΒΆ
The model name (defaults to @cf/baai/bge-small-en-v1.5)
- account_idΒΆ
Cloudflare account ID
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.CohereEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Cohere embedding models.
This class configures embedding models from Cohere services.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.COHERE
- modelΒΆ
The Cohere model name for embeddings (defaults to embed-english-v3.0)
- input_typeΒΆ
Type of input to be embedded (defaults to search_document)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.FastEmbedEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for FastEmbed embedding models.
This class configures FastEmbed models, which are lightweight and efficient embeddings that can run on CPU.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.FASTEMBED
- modelΒΆ
The model name (defaults to BAAI/bge-small-en-v1.5)
- max_lengthΒΆ
Maximum sequence length
- cache_folderΒΆ
Where to cache the model files
- use_cacheΒΆ
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.HuggingFaceEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for HuggingFace embedding models.
This class configures embedding models from HuggingFaceβs model hub, with support for local caching and hardware acceleration.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.HUGGINGFACE
- modelΒΆ
The HuggingFace model ID (defaults to all-MiniLM-L6-v2)
- model_kwargsΒΆ
Additional keyword arguments for model instantiation
- encode_kwargsΒΆ
Additional keyword arguments for encoding
- query_encode_kwargsΒΆ
Additional keyword arguments for query encoding
- multi_processΒΆ
Whether to use multi-processing for encoding
- cache_folderΒΆ
Where to cache the model files
- show_progressΒΆ
Whether to show progress bars
- use_cacheΒΆ
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- instantiate(**kwargs)[source]ΒΆ
Instantiate a HuggingFace embedding model.
This method includes error handling and GPU memory cleanup in case of initialization failures.
- Parameters:
**kwargs β Additional keyword arguments to pass to HuggingFaceEmbeddings
- Returns:
The instantiated embedding model
- Return type:
HuggingFaceEmbeddings
- Raises:
Exception β If model instantiation fails after cleanup attempt
- class haive.core.models.embeddings.base.JinaEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Jina AI embedding models.
This class configures embedding models from Jina AI.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.JINA
- modelΒΆ
The model name (defaults to jina-embeddings-v2-base-en)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.LlamaCppEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for LlamaCpp local embedding models.
This class configures embedding models using LlamaCpp for local execution.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.LLAMACPP
- modelΒΆ
Required model name parameter (for compatibility with BaseEmbeddingConfig)
- model_pathΒΆ
Path to the model file
- n_ctxΒΆ
Context size for the model
- n_batchΒΆ
Batch size for inference
- n_gpu_layersΒΆ
Number of layers to offload to GPU
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.MockTorch[source]ΒΆ
Mock torch module for documentation builds.
- class haive.core.models.embeddings.base.OllamaEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Ollama embedding models.
This class configures embedding models from Ollama, which runs locally and doesnβt require an API key.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.OLLAMA
- modelΒΆ
The Ollama model name (defaults to nomic-embed-text)
- base_urlΒΆ
The base URL for the Ollama server
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.OpenAIEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for OpenAI embedding models.
This class configures embedding models from OpenAI services, supporting multiple model types and configurations.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.OPENAI
- modelΒΆ
The OpenAI model name for embeddings (defaults to text-embedding-3-small)
- dimensionsΒΆ
Output dimensions for the embedding vectors
- show_progress_barΒΆ
Whether to show progress bars during embedding
- chunk_sizeΒΆ
Batch size for embedding operations
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.SecureConfigMixin[source]ΒΆ
Mixin for securely handling API keys from environment variables.
This mixin provides methods for securely resolving API keys from environment variables or explicitly provided values, with appropriate fallbacks.
- class haive.core.models.embeddings.base.SentenceTransformerEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for SentenceTransformer embedding models.
This class configures embedding models from SentenceTransformers library, which provides efficient and accurate sentence and text embeddings.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.SENTENCE_TRANSFORMERS
- modelΒΆ
The model name or path (defaults to all-MiniLM-L6-v2)
- cache_folderΒΆ
Where to cache the model files
- use_cacheΒΆ
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.VertexAIEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Google Vertex AI embedding models.
This class configures embedding models from Google Vertex AI.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.VERTEXAI
- modelΒΆ
The model name (defaults to textembedding-gecko@latest)
- projectΒΆ
Google Cloud project ID
- locationΒΆ
Google Cloud region
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.VertexAIEmbeddings(*args, **kwargs)[source]ΒΆ
Mock VertexAI embeddings to avoid slow imports.
- class haive.core.models.embeddings.base.VoyageAIEmbeddingConfig(/, **data)[source]ΒΆ
Bases:
BaseEmbeddingConfig
Configuration for Voyage AI embedding models.
This class configures embedding models from Voyage AI.
- Parameters:
data (Any)
- providerΒΆ
Set to EmbeddingProvider.VOYAGEAI
- modelΒΆ
The model name (defaults to voyage-2)
- voyage_api_urlΒΆ
The API URL for Voyage AI
- voyage_api_versionΒΆ
The API version for Voyage AI
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- haive.core.models.embeddings.base.create_embeddings(config)[source]ΒΆ
Factory function to create embedding models from a configuration.
This function simplifies the instantiation of embedding models by delegating to the appropriate configuration class.
- Parameters:
config (BaseEmbeddingConfig) β The embedding model configuration
- Returns:
The instantiated embedding model
- Return type:
Any
Example:
Examples
>>> config = HuggingFaceEmbeddingConfig(model="sentence-transformers/all-mpnet-base-v2") >>> embeddings = create_embeddings(config)