haive.core.engine.retriever.providers.DocArrayRetrieverConfig¶

DocArray Retriever implementation for the Haive framework.

from typing import Any This module provides a configuration class for the DocArray retriever, which uses DocArray’s vector search capabilities for document retrieval. DocArray is a library for representing, sending, and searching multimodal data, providing efficient vector operations and search.

The DocArrayRetriever works by: 1. Using DocArray’s DocumentArray for document storage 2. Performing vector similarity search with various metrics 3. Supporting efficient in-memory and persisted search 4. Enabling multimodal document processing

This retriever is particularly useful when: - Working with multimodal documents (text, images, etc.) - Need efficient in-memory vector search - Want lightweight vector operations - Building prototypes or smaller datasets - Using DocArray for document processing

The implementation integrates with LangChain’s DocArrayRetriever while providing a consistent Haive configuration interface.

Classes¶

DocArrayRetrieverConfig

Configuration for DocArray retriever in the Haive framework.

Module Contents¶

class haive.core.engine.retriever.providers.DocArrayRetrieverConfig.DocArrayRetrieverConfig[source]¶

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for DocArray retriever in the Haive framework.

This retriever uses DocArray’s vector search capabilities to provide efficient document similarity search with support for multimodal data.

retriever_type¶

The type of retriever (always DOC_ARRAY).

Type:

RetrieverType

documents¶

Documents to index for retrieval.

Type:

List[Document]

k¶

Number of documents to retrieve.

Type:

int

similarity_metric¶

Distance metric for similarity calculation.

Type:

str

embedding_model¶

Embedding model for vectorization.

Type:

Optional[str]

persist_path¶

Path to persist the DocumentArray.

Type:

Optional[str]

Examples

>>> from haive.core.engine.retriever import DocArrayRetrieverConfig
>>> from langchain_core.documents import Document
>>>
>>> # Create documents
>>> docs = [
...     Document(page_content="Machine learning is a subset of AI"),
...     Document(page_content="Deep learning uses neural networks"),
...     Document(page_content="Natural language processing handles text")
... ]
>>>
>>> # Create the DocArray retriever config
>>> config = DocArrayRetrieverConfig(
...     name="docarray_retriever",
...     documents=docs,
...     k=5,
...     similarity_metric="cosine",
...     embedding_model="sentence-transformers/all-MiniLM-L6-v2"
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("neural networks in AI")
>>>
>>> # Example with persistence
>>> persistent_config = DocArrayRetrieverConfig(
...     name="persistent_docarray_retriever",
...     documents=docs,
...     persist_path="./docarray_index",
...     similarity_metric="euclidean"
... )
get_input_fields()[source]¶

Return input field definitions for DocArray retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions for DocArray retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]¶

Create a DocArray retriever from this configuration.

Returns:

Instantiated retriever ready for multimodal search.

Return type:

DocArrayRetriever

Raises:
  • ImportError – If required packages are not available.

  • ValueError – If documents list is empty or configuration is invalid.