haive.core.engine.retriever.providers.BM25RetrieverConfigΒΆ
BM25 Retriever implementation for the Haive framework.
from typing import Any This module provides a configuration class for the BM25 (Best Matching 25) retriever, which uses the BM25 ranking function for text retrieval. BM25 is a probabilistic ranking function used by search engines to estimate the relevance of documents to a given search query.
The BM25Retriever works by: 1. Tokenizing and preprocessing documents and queries 2. Computing BM25 scores for each document-query pair 3. Ranking documents by their BM25 scores 4. Returning the top-k most relevant documents
This retriever is particularly useful when: - Working with text-based document collections - Need precise keyword matching and term frequency analysis - Want interpretable ranking scores - Building traditional information retrieval systems - Combining with vector search in hybrid approaches
The implementation integrates with LangChainβs BM25Retriever while providing a consistent Haive configuration interface.
ClassesΒΆ
Configuration for BM25 retriever in the Haive framework. |
Module ContentsΒΆ
- class haive.core.engine.retriever.providers.BM25RetrieverConfig.BM25RetrieverConfig[source]ΒΆ
Bases:
haive.core.engine.retriever.retriever.BaseRetrieverConfig
Configuration for BM25 retriever in the Haive framework.
This retriever uses the BM25 ranking function to score documents based on term frequency, inverse document frequency, and document length normalization.
- retriever_typeΒΆ
The type of retriever (always BM25).
- Type:
- documentsΒΆ
Documents to index for retrieval.
- Type:
List[Document]
Examples
>>> from haive.core.engine.retriever import BM25RetrieverConfig >>> from langchain_core.documents import Document >>> >>> # Create documents >>> docs = [ ... Document(page_content="Machine learning is a subset of AI"), ... Document(page_content="Deep learning uses neural networks"), ... Document(page_content="Natural language processing handles text") ... ] >>> >>> # Create the BM25 retriever config >>> config = BM25RetrieverConfig( ... name="bm25_retriever", ... documents=docs, ... k=2, ... k1=1.5, # Higher term frequency saturation ... b=0.8 # More document length normalization ... ) >>> >>> # Instantiate and use the retriever >>> retriever = config.instantiate() >>> docs = retriever.get_relevant_documents("machine learning algorithms")
- instantiate()[source]ΒΆ
Create a BM25 retriever from this configuration.
- Returns:
Instantiated retriever ready for text ranking.
- Return type:
BM25Retriever
- Raises:
ImportError β If required packages are not available.
ValueError β If documents list is empty.