haive.core.engine.retriever.providers.TFIDFRetrieverConfigΒΆ
TF-IDF Retriever implementation for the Haive framework.
from typing import Any This module provides a configuration class for the TF-IDF (Term Frequency-Inverse Document Frequency) retriever, which uses classical TF-IDF scoring for document retrieval. TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection of documents.
The TFIDFRetriever works by: 1. Computing term frequency (TF) for each term in each document 2. Computing inverse document frequency (IDF) for each term across the corpus 3. Calculating TF-IDF scores as the product of TF and IDF 4. Ranking documents by their TF-IDF similarity to the query
This retriever is particularly useful when: - Working with text-based document collections - Need classical information retrieval approaches - Want interpretable term-based ranking - Building baseline retrieval systems - Comparing against modern neural approaches
The implementation integrates with LangChainβs TFIDFRetriever while providing a consistent Haive configuration interface.
ClassesΒΆ
Configuration for TF-IDF retriever in the Haive framework. |
Module ContentsΒΆ
- class haive.core.engine.retriever.providers.TFIDFRetrieverConfig.TFIDFRetrieverConfig[source]ΒΆ
Bases:
haive.core.engine.retriever.retriever.BaseRetrieverConfig
Configuration for TF-IDF retriever in the Haive framework.
This retriever uses Term Frequency-Inverse Document Frequency scoring to rank documents based on the importance of query terms in the document collection.
- retriever_typeΒΆ
The type of retriever (always TFIDF).
- Type:
- documentsΒΆ
Documents to index for retrieval.
- Type:
List[Document]
- tfidf_paramsΒΆ
Additional parameters for TF-IDF computation.
- Type:
Optional[Dict]
Examples
>>> from haive.core.engine.retriever import TFIDFRetrieverConfig >>> from langchain_core.documents import Document >>> >>> # Create documents >>> docs = [ ... Document(page_content="Machine learning algorithms analyze data"), ... Document(page_content="Deep learning networks process information"), ... Document(page_content="Natural language models understand text") ... ] >>> >>> # Create the TF-IDF retriever config >>> config = TFIDFRetrieverConfig( ... name="tfidf_retriever", ... documents=docs, ... k=2, ... tfidf_params={"max_features": 1000, "stop_words": "english"} ... ) >>> >>> # Instantiate and use the retriever >>> retriever = config.instantiate() >>> docs = retriever.get_relevant_documents("machine learning data analysis")
- instantiate()[source]ΒΆ
Create a TF-IDF retriever from this configuration.
- Returns:
Instantiated retriever ready for text ranking.
- Return type:
TFIDFRetriever
- Raises:
ImportError β If required packages are not available.
ValueError β If documents list is empty.