`haive.core.models.vectorstore`¶

Quick Links

VectorStoreConfig - Main configuration class
VectorStoreProvider - Supported providers enum
Examples - Usage examples
Related: Embeddings - Embedding configurations

Overview ¶

This module provides comprehensive abstractions and implementations for working with vector stores in the Haive framework. Vector stores are specialized databases optimized for storing and retrieving high-dimensional vectors, typically used for similarity search in RAG (Retrieval-Augmented Generation) applications.

Note

Vector stores enable efficient semantic search by storing document embeddings and providing fast similarity-based retrieval. They are essential components for building RAG systems, recommendation engines, and other applications that require similarity search over large document collections.

Supported Providers ¶

Open Source

Chroma - Local and server modes
FAISS - Facebook AI Similarity Search
Weaviate - Vector search engine
Qdrant - Similarity search engine
Milvus - Distributed vector database

Cloud Services

Pinecone - Managed vector database
Supabase - PostgreSQL + pgvector
MongoDB Atlas - Vector search
OpenSearch - Elasticsearch-based
Redis - Vector search capabilities

Specialized

LanceDB - Serverless vector DB
Marqo - Tensor search engine
Zilliz - Cloud-native Milvus

Quick Start ¶

Local Development

from haive.core.models.vectorstore import VectorStoreConfig, VectorStoreProvider

# Configure a local vector store
config = VectorStoreConfig(
    provider=VectorStoreProvider.Chroma,
    collection_name="documents",
    persist_directory="./chroma_db"
)

# Create and use the vector store
vectorstore = config.instantiate()
vectorstore.add_texts(["Document content"], metadatas=[{"source": "doc1"}])
results = vectorstore.similarity_search("query text", k=5)

Cloud Production

# Configure for production with Pinecone
config = VectorStoreConfig(
    provider=VectorStoreProvider.Pinecone,
    api_key_env_var="PINECONE_API_KEY",
    environment="us-west1-gcp",
    index_name="production-index"
)

# Create with custom embeddings
from haive.core.models.embeddings import OpenAIEmbeddingConfig

embedding_config = OpenAIEmbeddingConfig(model="text-embedding-3-small")
config.embedding_model = embedding_config

From Documents

from haive.core.models.vectorstore import VectorStoreConfig
from langchain_core.documents import Document

# Create documents
docs = [
    Document(page_content="Content 1", metadata={"source": "file1.txt"}),
    Document(page_content="Content 2", metadata={"source": "file2.txt"})
]

# Create vector store from documents
vs_config = VectorStoreConfig.create_vs_config_from_documents(
    documents=docs,
    vector_store_provider=VectorStoreProvider.Chroma
)

API Reference ¶

Configuration Classes ¶

`VectorStoreConfig`(*[, name, ...])	Configuration model for a vector store.
`VectorStoreProvider`(*values)	Enumeration of supported vector store providers.

class haive.core.models.vectorstore.VectorStoreConfig(*, name=None, embedding_model=HuggingFaceEmbeddingConfig(provider=<EmbeddingProvider.HUGGINGFACE: 'huggingface'>, model='sentence-transformers/all-mpnet-base-v2', api_key=SecretStr(''), model_kwargs={'device': 'cpu'}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, cache_folder='/home/will/Projects/haive/resources/embeddings_cache', show_progress=False, use_cache=True), vector_store_provider=VectorStoreProvider.FAISS, vector_store_path='vector_store', vector_store_kwargs=<factory>, documents=<factory>, docstore_path='docstore')[source]¶

Bases: BaseModel

Configuration model for a vector store.

Configuration Examples

Local Vector Store:

config = VectorStoreConfig(
    provider=VectorStoreProvider.Chroma,
    persist_directory="./local_db",
    collection_name="my_documents"
)

Cloud Vector Store with Authentication:

config = VectorStoreConfig(
    provider=VectorStoreProvider.Pinecone,
    api_key_env_var="PINECONE_API_KEY",
    environment="us-west1-gcp",
    index_name="production",
    vector_store_kwargs={
        "metric": "cosine",
        "dimension": 1536
    }
)

Parameters:

data (Any)
name (str | None)
embedding_model (BaseEmbeddingConfig)
vector_store_provider (VectorStoreProvider)
vector_store_path (str)
vector_store_kwargs (dict[str, Any])
documents (list[Document])
docstore_path (str)

classmethod __get_pydantic_json_schema__(core_schema, handler, /)¶

Hook into generating the model’s JSON schema.

Parameters:

core_schema (CoreSchema) – A pydantic-core CoreSchema. You can ignore this argument and call the handler with a new CoreSchema, wrap this CoreSchema ({‘type’: ‘nullable’, ‘schema’: current_schema}), or just call the handler with the original schema.
handler (GetJsonSchemaHandler) – Call into Pydantic’s internal JSON schema generation. This will raise a pydantic.errors.PydanticInvalidForJsonSchema if JSON schema generation fails. Since this gets called by BaseModel.model_json_schema you can override the schema_generator argument to that function to change JSON schema generation globally for a type.

Returns:

A JSON schema, as a Python object.

Return type:

JsonSchemaValue

classmethod __pydantic_init_subclass__(**kwargs)¶

This is intended to behave just like __init_subclass__, but is called by ModelMetaclass only after the class is actually fully initialized. In particular, attributes like model_fields will be present when this is called.

This is necessary because __init_subclass__ will always be called by type.__new__, and it would require a prohibitively large refactor to the ModelMetaclass to ensure that type.__new__ was called in such a manner that the class would already be sufficiently initialized.

This will receive the same kwargs that would be passed to the standard __init_subclass__, namely, any kwargs passed to the class definition that aren’t used internally by pydantic.

Parameters:: **kwargs (Any) – Any keyword arguments passed to the class definition that aren’t used internally by pydantic.
Return type:: None

classmethod construct(_fields_set=None, **values)¶

Parameters:

_fields_set (set[str] | None)
values (Any)

Return type:

Self

classmethod create_vs_config_from_documents(documents, embedding_model=HuggingFaceEmbeddingConfig(provider=<EmbeddingProvider.HUGGINGFACE: 'huggingface'>, model='sentence-transformers/all-mpnet-base-v2', api_key=SecretStr(''), model_kwargs={'device': 'cpu'}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, cache_folder='/home/will/Projects/haive/resources/embeddings_cache', show_progress=False, use_cache=True), **kwargs)[source]¶

Create a VectorStoreConfig from a list of documents.

Parameters:

documents (list[Document])
embedding_model (BaseEmbeddingConfig)

Return type:

VectorStoreConfig

classmethod create_vs_from_documents(documents, embedding_model=HuggingFaceEmbeddingConfig(provider=<EmbeddingProvider.HUGGINGFACE: 'huggingface'>, model='sentence-transformers/all-mpnet-base-v2', api_key=SecretStr(''), model_kwargs={'device': 'cpu'}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, cache_folder='/home/will/Projects/haive/resources/embeddings_cache', show_progress=False, use_cache=True), **kwargs)[source]¶

Create a VectorStore from a list of documents.

Parameters:

documents (list[Document])
embedding_model (BaseEmbeddingConfig)

Return type:

VectorStoreConfig

classmethod from_orm(obj)¶

Parameters:: obj (Any)
Return type:: Self

classmethod model_construct(_fields_set=None, **values)¶

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set (set[str] | None) – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values (Any) – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

Return type:

Self

classmethod model_json_schema(by_alias=True, ref_template='#/$defs/{model}', schema_generator=<class 'pydantic.json_schema.GenerateJsonSchema'>, mode='validation')¶

Generates a JSON schema for a model class.

Parameters:

by_alias (bool) – Whether to use attribute aliases or not.
ref_template (str) – The reference template.
schema_generator (type[GenerateJsonSchema]) – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode (Literal['validation', 'serialization']) – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

Return type:

dict[str, Any]

classmethod model_parametrized_name(params)¶

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params (tuple[type[Any], ...]) – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.
Return type:: str

classmethod model_rebuild(*, force=False, raise_errors=True, _parent_namespace_depth=2, _types_namespace=None)¶

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force (bool) – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors (bool) – Whether to raise errors, defaults to True.
_parent_namespace_depth (int) – The depth level of the parent namespace, defaults to 2.
_types_namespace (MappingNamespace | None) – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Return type:

bool | None

classmethod model_validate(obj, *, strict=None, from_attributes=None, context=None, by_alias=None, by_name=None)¶

Validate a pydantic model instance.

Parameters:

obj (Any) – The object to validate.
strict (bool | None) – Whether to enforce types strictly.
from_attributes (bool | None) – Whether to extract data from object attributes.
context (Any | None) – Additional context to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

Return type:

Self

classmethod model_validate_json(json_data, *, strict=None, context=None, by_alias=None, by_name=None)¶

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data (str | bytes | bytearray) – The JSON data to validate.
strict (bool | None) – Whether to enforce types strictly.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

Return type:

Self

classmethod model_validate_strings(obj, *, strict=None, context=None, by_alias=None, by_name=None)¶

Validate the given object with string data against the Pydantic model.

Parameters:

obj (Any) – The object containing string data to validate.
strict (bool | None) – Whether to enforce types strictly.
context (Any | None) – Extra variables to pass to the validator.
by_alias (bool | None) – Whether to use the field’s alias when validating against the provided input data.
by_name (bool | None) – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Return type:

Self

classmethod parse_file(path, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶

Parameters:

path (str | Path)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)

Return type:

Self

classmethod parse_obj(obj)¶

Parameters:: obj (Any)
Return type:: Self

classmethod parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)¶

Parameters:

b (str | bytes)
content_type (str | None)
encoding (str)
proto (DeprecatedParseProtocol | None)
allow_pickle (bool)

Return type:

Self

classmethod schema(by_alias=True, ref_template='#/$defs/{model}')¶

Parameters:

by_alias (bool)
ref_template (str)

Return type:

Dict[str, Any]

classmethod schema_json(*, by_alias=True, ref_template='#/$defs/{model}', **dumps_kwargs)¶

Parameters:

by_alias (bool)
ref_template (str)
dumps_kwargs (Any)

Return type:

str

classmethod update_forward_refs(**localns)¶

Parameters:: localns (Any)
Return type:: None

classmethod validate(value)¶

Parameters:: value (Any)
Return type:: Self

__copy__()¶

Returns a shallow copy of the model.

Return type:: Self

__deepcopy__(memo=None)¶

Returns a deep copy of the model.

Parameters:: memo (dict[int, Any] | None)
Return type:: Self

__init__(**data)¶

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)
Return type:: None

__iter__()¶

So dict(model) works.

Return type:: Generator[tuple[str, Any], None, None]

__pretty__(fmt, **kwargs)¶

Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.

Parameters:

fmt (Callable[[Any], Any])
kwargs (Any)

Return type:

Generator[Any, None, None]

__repr_name__()¶

Name of the instance’s class, used in __repr__.

Return type:: str

__repr_recursion__(object)¶

Returns the string representation of a recursive object.

Parameters:: object (Any)
Return type:: str

__rich_repr__()¶

Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.

Return type:: RichReprResult

add_document(document)[source]¶

Add a single document to the vector store config.

Parameters:: document (Document)

copy(*, include=None, exclude=None, update=None, deep=False)¶

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to include in the copied model.
exclude (AbstractSetIntStr | MappingIntStrAny | None) – Optional set or mapping specifying which fields to exclude in the copied model.
update (Dict[str, Any] | None) – Optional dictionary of field-value pairs to override field values in the copied model.
deep (bool) – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

Return type:

Self

create_retriever(async_mode=False)[source]¶

Create a retriever from the vector store.

Parameters:: async_mode (bool)

create_vectorstore(async_mode=False)[source]¶

Create a vector store instance from this configuration.

Parameters:: async_mode (bool)

dict(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False)¶

Parameters:

include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)

Return type:

Dict[str, Any]

json(*, include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=PydanticUndefined, models_as_dict=PydanticUndefined, **dumps_kwargs)¶

Parameters:

include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None)
by_alias (bool)
exclude_unset (bool)
exclude_defaults (bool)
exclude_none (bool)
encoder (Callable[[Any], Any] | None)
models_as_dict (bool)
dumps_kwargs (Any)

Return type:

str

model_copy(*, update=None, deep=False)¶

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update (Mapping[str, Any] | None) – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep (bool) – Set to True to make a deep copy of the model.

Returns:

New model instance.

Return type:

Self

model_dump(*, mode='python', include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode (Literal['json', 'python'] | str) – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to include in the output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – A set of fields to exclude from the output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

Return type:

dict[str, Any]

model_dump_json(*, indent=None, include=None, exclude=None, context=None, by_alias=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, fallback=None, serialize_as_any=False)¶

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent (int | None) – Indentation to use in the JSON output. If None is passed, the output will be compact.
include (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to include in the JSON output.
exclude (set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None) – Field(s) to exclude from the JSON output.
context (Any | None) – Additional context to pass to the serializer.
by_alias (bool | None) – Whether to serialize using field aliases.
exclude_unset (bool) – Whether to exclude fields that have not been explicitly set.
exclude_defaults (bool) – Whether to exclude fields that are set to their default value.
exclude_none (bool) – Whether to exclude fields that have a value of None.
round_trip (bool) – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings (bool | Literal['none', 'warn', 'error']) – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback (Callable[[Any], Any] | None) – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any (bool) – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Return type:

str

model_post_init(context, /)¶

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Parameters:: context (Any)
Return type:: None

docstore_path: str¶

documents: list[Document]¶

embedding_model: BaseEmbeddingConfig¶

model_computed_fields = {}¶

Return type:: dict[str, pydantic.fields.ComputedFieldInfo]

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None¶

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.
Return type:: dict[str, Any] | None

model_fields = {'docstore_path': FieldInfo(annotation=str, required=False, default='docstore', description='Where to store raw and processed documents'), 'documents': FieldInfo(annotation=list[Document], required=False, default_factory=list, description='The raw documents to store'), 'embedding_model': FieldInfo(annotation=BaseEmbeddingConfig, required=False, default=HuggingFaceEmbeddingConfig(provider=<EmbeddingProvider.HUGGINGFACE: 'huggingface'>, model='sentence-transformers/all-mpnet-base-v2', api_key=SecretStr(''), model_kwargs={'device': 'cpu'}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, cache_folder='/home/will/Projects/haive/resources/embeddings_cache', show_progress=False, use_cache=True), description='The embedding model to use for the vector store'), 'name': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'vector_store_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default_factory=dict, description='Optional kwargs for the vector store'), 'vector_store_path': FieldInfo(annotation=str, required=False, default='vector_store', description='The path to the vector store'), 'vector_store_provider': FieldInfo(annotation=VectorStoreProvider, required=False, default=<VectorStoreProvider.FAISS: 'FAISS'>, description='The type of vector store to use')}¶

Return type:: dict[str, pydantic.fields.FieldInfo]

property model_fields_set: set[str]¶

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

Return type:

set[str]

name: str | None¶

vector_store_kwargs: dict[str, Any]¶

vector_store_path: str¶

vector_store_provider: VectorStoreProvider¶

class haive.core.models.vectorstore.VectorStoreProvider(*values)[source]¶

Bases: str, Enum

Enumeration of supported vector store providers.

Available Providers

Provider	Type	Best For
`Chroma`	Open Source	Local development, prototyping
`Pinecone`	Cloud Service	Production, managed infrastructure
`FAISS`	Local/Memory	High-performance similarity search
`Weaviate`	Open Source	GraphQL queries, hybrid search

Chroma = 'Chroma'¶

FAISS = 'FAISS'¶

InMemory = 'InMemory'¶

Milvus = 'Milvus'¶

Pinecone = 'Pinecone'¶

Qdrant = 'Qdrant'¶

Weaviate = 'Weaviate'¶

Zilliz = 'Zilliz'¶

Functions ¶

haive.core.models.vectorstore.add_document(*args, **kwargs)[source]¶: Placeholder function.

Architecture ¶

        graph LR
    A[Documents] --> B[Embeddings]
    B --> C[Vector Store]
    C --> D[Similarity Search]
    D --> E[Retrieved Documents]

    subgraph "Vector Store Types"
        F[Local/File-based]
        G[Cloud Services]
        H[In-Memory]
    end

Performance Considerations ¶

Optimization Tips

Index Type: Different providers support different index types (HNSW, IVF, etc.)
Batch Operations: Use batch operations for better performance when adding many documents
Connection Pooling: Configured automatically for cloud providers
Caching: In-memory caching for frequently accessed embeddings

Warning

Large-scale deployments should consider:

Index size limitations
Query latency requirements
Cost per query/storage
Data persistence needs

# Migrate from one provider to another
def migrate_vectorstore(source_config, target_config):
    """Migrate documents between vector stores."""
    # Extract from source
    source_vs = source_config.instantiate()
    docs = source_vs.similarity_search("", k=1000)  # Get all

    # Load into target
    target_vs = target_config.instantiate()
    target_vs.add_documents(docs)

    return target_vs

`haive.core.models.vectorstore`¶

Overview ¶

Supported Providers ¶

Quick Start ¶

API Reference ¶

Configuration Classes ¶

Functions ¶

Architecture ¶

Performance Considerations ¶

Extended Examples ¶

RAG Pipeline Example ¶

Migration Between Providers ¶

See Also ¶