haive.core.engine.retriever.providers.MergerRetrieverConfig

Merger Retriever implementation for the Haive framework.

from typing import Any This module provides a configuration class for the Merger retriever, which combines and merges results from multiple retrievers to provide comprehensive and deduplicated search results.

The MergerRetriever works by: 1. Running multiple retrievers in parallel on the same query 2. Collecting all results from different retrieval strategies 3. Merging and deduplicating results based on content or metadata 4. Applying optional ranking and filtering to the merged results

This retriever is particularly useful when: - Need to combine results from different retrieval approaches - Want comprehensive coverage across multiple data sources - Building systems that need to deduplicate overlapping results - Implementing federated search across different backends

The implementation integrates with LangChain’s MergerRetriever while providing a consistent Haive configuration interface with flexible merging options.

Classes

MergerRetrieverConfig

Configuration for Merger retriever in the Haive framework.

Module Contents

class haive.core.engine.retriever.providers.MergerRetrieverConfig.MergerRetrieverConfig[source]

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Merger retriever in the Haive framework.

This retriever combines and merges results from multiple retrievers to provide comprehensive and deduplicated search results.

retriever_type

The type of retriever (always MERGER).

Type:

RetrieverType

retrievers

List of retriever configurations to merge.

Type:

List[BaseRetrieverConfig]

max_results

Maximum number of results to return after merging.

Type:

int

Examples

>>> from haive.core.engine.retriever import MergerRetrieverConfig
>>> from haive.core.engine.retriever.providers.BM25RetrieverConfig import BM25RetrieverConfig
>>> from haive.core.engine.retriever.providers.VectorStoreRetrieverConfig import VectorStoreRetrieverConfig
>>>
>>> # Create individual retrievers
>>> bm25_config = BM25RetrieverConfig(name="bm25", documents=docs, k=10)
>>> vector_config = VectorStoreRetrieverConfig(name="vector", vectorstore_config=vs_config, k=10)
>>>
>>> # Create merger retriever
>>> config = MergerRetrieverConfig(
...     name="merger_retriever",
...     retrievers=[bm25_config, vector_config],
...     max_results=15
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("machine learning algorithms")
get_input_fields()[source]

Return input field definitions for Merger retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]

Return output field definitions for Merger retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]

Create a Merger retriever from this configuration.

Returns:

Instantiated retriever ready for merging multiple retrieval results.

Return type:

MergerRetriever

Raises: