haive.core.engine.retriever.providers.ParentDocumentRetrieverConfig¶

Parent Document Retriever implementation for the Haive framework.

This module provides a configuration class for the Parent Document retriever, which retrieves small chunks for embedding similarity but returns larger parent documents containing those chunks, providing better context while maintaining search precision.

The ParentDocumentRetriever works by: 1. Splitting documents into small chunks for embedding and similarity search 2. Storing these chunks in a vector store with references to parent documents 3. Storing full parent documents in a separate document store 4. When querying, finding similar chunks but returning their parent documents

This retriever is particularly useful when: - Need precise similarity search on small chunks - Want to return full context from larger parent documents - Building systems that balance search precision with context completeness - Dealing with long documents that need chunk-level search

The implementation integrates with LangChain’s ParentDocumentRetriever while providing a consistent Haive configuration interface with flexible chunking options.

Classes¶

ParentDocumentRetrieverConfig

Configuration for Parent Document retriever in the Haive framework.

Module Contents¶

class haive.core.engine.retriever.providers.ParentDocumentRetrieverConfig.ParentDocumentRetrieverConfig[source]¶

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Parent Document retriever in the Haive framework.

This retriever retrieves small chunks for similarity search but returns larger parent documents, providing better context while maintaining search precision.

retriever_type¶

The type of retriever (always PARENT_DOCUMENT).

Type:: RetrieverType

vectorstore_config¶

Vector store for storing child chunks.

Type:: VectorStoreConfig

docstore_type¶

Type of document store for parent documents.

Type:: str

child_chunk_size¶

Size of child chunks for embedding.

Type:: int

child_chunk_overlap¶

Overlap between child chunks.

Type:: int

k¶

Number of child chunks to retrieve (returns their parents).

Type:: int

Examples

>>> from haive.core.engine.retriever import ParentDocumentRetrieverConfig
>>> from haive.core.engine.vectorstore.providers.ChromaVectorStoreConfig import ChromaVectorStoreConfig
>>>
>>> # Create vector store config
>>> vs_config = ChromaVectorStoreConfig(
...     name="parent_doc_store",
...     collection_name="child_chunks"
... )
>>>
>>> # Create parent document retriever
>>> config = ParentDocumentRetrieverConfig(
...     name="parent_doc_retriever",
...     vectorstore_config=vs_config,
...     child_chunk_size=200,
...     child_chunk_overlap=20,
...     k=4
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("machine learning algorithms")

get_input_fields()[source]¶

Return input field definitions for Parent Document retriever.

Return type:: dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions for Parent Document retriever.

Return type:: dict[str, tuple[type, Any]]

instantiate()[source]¶

Create a Parent Document retriever from this configuration.

Returns:

Instantiated retriever ready for parent document retrieval.

Return type:

ParentDocumentRetriever

Raises:

ImportError – If required packages are not available.
ValueError – If configuration is invalid.

classmethod validate_child_chunk_overlap(v, info)[source]¶: Validate that child chunk overlap is less than chunk size.

classmethod validate_docstore_path(v, info)[source]¶: Validate docstore path is provided when needed.

classmethod validate_docstore_type(v)[source]¶: Validate document store type.