haive.core.engine.retriever.providers.ParentDocumentRetrieverConfigΒΆ
Parent Document Retriever implementation for the Haive framework.
This module provides a configuration class for the Parent Document retriever, which retrieves small chunks for embedding similarity but returns larger parent documents containing those chunks, providing better context while maintaining search precision.
The ParentDocumentRetriever works by: 1. Splitting documents into small chunks for embedding and similarity search 2. Storing these chunks in a vector store with references to parent documents 3. Storing full parent documents in a separate document store 4. When querying, finding similar chunks but returning their parent documents
This retriever is particularly useful when: - Need precise similarity search on small chunks - Want to return full context from larger parent documents - Building systems that balance search precision with context completeness - Dealing with long documents that need chunk-level search
The implementation integrates with LangChainβs ParentDocumentRetriever while providing a consistent Haive configuration interface with flexible chunking options.
ClassesΒΆ
Configuration for Parent Document retriever in the Haive framework. |
Module ContentsΒΆ
- class haive.core.engine.retriever.providers.ParentDocumentRetrieverConfig.ParentDocumentRetrieverConfig[source]ΒΆ
Bases:
haive.core.engine.retriever.retriever.BaseRetrieverConfig
Configuration for Parent Document retriever in the Haive framework.
This retriever retrieves small chunks for similarity search but returns larger parent documents, providing better context while maintaining search precision.
- retriever_typeΒΆ
The type of retriever (always PARENT_DOCUMENT).
- Type:
- vectorstore_configΒΆ
Vector store for storing child chunks.
- Type:
Examples
>>> from haive.core.engine.retriever import ParentDocumentRetrieverConfig >>> from haive.core.engine.vectorstore.providers.ChromaVectorStoreConfig import ChromaVectorStoreConfig >>> >>> # Create vector store config >>> vs_config = ChromaVectorStoreConfig( ... name="parent_doc_store", ... collection_name="child_chunks" ... ) >>> >>> # Create parent document retriever >>> config = ParentDocumentRetrieverConfig( ... name="parent_doc_retriever", ... vectorstore_config=vs_config, ... child_chunk_size=200, ... child_chunk_overlap=20, ... k=4 ... ) >>> >>> # Instantiate and use the retriever >>> retriever = config.instantiate() >>> docs = retriever.get_relevant_documents("machine learning algorithms")
- instantiate()[source]ΒΆ
Create a Parent Document retriever from this configuration.
- Returns:
Instantiated retriever ready for parent document retrieval.
- Return type:
ParentDocumentRetriever
- Raises:
ImportError β If required packages are not available.
ValueError β If configuration is invalid.
- classmethod validate_child_chunk_overlap(v, info)[source]ΒΆ
Validate that child chunk overlap is less than chunk size.