haive.core.engine.retriever.providers.DocArrayRetrieverConfig¶
DocArray Retriever implementation for the Haive framework.
from typing import Any This module provides a configuration class for the DocArray retriever, which uses DocArray’s vector search capabilities for document retrieval. DocArray is a library for representing, sending, and searching multimodal data, providing efficient vector operations and search.
The DocArrayRetriever works by: 1. Using DocArray’s DocumentArray for document storage 2. Performing vector similarity search with various metrics 3. Supporting efficient in-memory and persisted search 4. Enabling multimodal document processing
This retriever is particularly useful when: - Working with multimodal documents (text, images, etc.) - Need efficient in-memory vector search - Want lightweight vector operations - Building prototypes or smaller datasets - Using DocArray for document processing
The implementation integrates with LangChain’s DocArrayRetriever while providing a consistent Haive configuration interface.
Classes¶
Configuration for DocArray retriever in the Haive framework. |
Module Contents¶
- class haive.core.engine.retriever.providers.DocArrayRetrieverConfig.DocArrayRetrieverConfig[source]¶
Bases:
haive.core.engine.retriever.retriever.BaseRetrieverConfig
Configuration for DocArray retriever in the Haive framework.
This retriever uses DocArray’s vector search capabilities to provide efficient document similarity search with support for multimodal data.
- retriever_type¶
The type of retriever (always DOC_ARRAY).
- Type:
- documents¶
Documents to index for retrieval.
- Type:
List[Document]
Examples
>>> from haive.core.engine.retriever import DocArrayRetrieverConfig >>> from langchain_core.documents import Document >>> >>> # Create documents >>> docs = [ ... Document(page_content="Machine learning is a subset of AI"), ... Document(page_content="Deep learning uses neural networks"), ... Document(page_content="Natural language processing handles text") ... ] >>> >>> # Create the DocArray retriever config >>> config = DocArrayRetrieverConfig( ... name="docarray_retriever", ... documents=docs, ... k=5, ... similarity_metric="cosine", ... embedding_model="sentence-transformers/all-MiniLM-L6-v2" ... ) >>> >>> # Instantiate and use the retriever >>> retriever = config.instantiate() >>> docs = retriever.get_relevant_documents("neural networks in AI") >>> >>> # Example with persistence >>> persistent_config = DocArrayRetrieverConfig( ... name="persistent_docarray_retriever", ... documents=docs, ... persist_path="./docarray_index", ... similarity_metric="euclidean" ... )
- instantiate()[source]¶
Create a DocArray retriever from this configuration.
- Returns:
Instantiated retriever ready for multimodal search.
- Return type:
DocArrayRetriever
- Raises:
ImportError – If required packages are not available.
ValueError – If documents list is empty or configuration is invalid.