haive.core.engine.retriever.providers.TFIDFRetrieverConfig¶

TF-IDF Retriever implementation for the Haive framework.

from typing import Any This module provides a configuration class for the TF-IDF (Term Frequency-Inverse Document Frequency) retriever, which uses classical TF-IDF scoring for document retrieval. TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection of documents.

The TFIDFRetriever works by: 1. Computing term frequency (TF) for each term in each document 2. Computing inverse document frequency (IDF) for each term across the corpus 3. Calculating TF-IDF scores as the product of TF and IDF 4. Ranking documents by their TF-IDF similarity to the query

This retriever is particularly useful when: - Working with text-based document collections - Need classical information retrieval approaches - Want interpretable term-based ranking - Building baseline retrieval systems - Comparing against modern neural approaches

The implementation integrates with LangChain’s TFIDFRetriever while providing a consistent Haive configuration interface.

Classes¶

TFIDFRetrieverConfig

Configuration for TF-IDF retriever in the Haive framework.

Module Contents¶

class haive.core.engine.retriever.providers.TFIDFRetrieverConfig.TFIDFRetrieverConfig[source]¶

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for TF-IDF retriever in the Haive framework.

This retriever uses Term Frequency-Inverse Document Frequency scoring to rank documents based on the importance of query terms in the document collection.

retriever_type¶

The type of retriever (always TFIDF).

Type:: RetrieverType

documents¶

Documents to index for retrieval.

Type:: List[Document]

k¶

Number of documents to retrieve (default: 4).

Type:: int

tfidf_params¶

Additional parameters for TF-IDF computation.

Type:: Optional[Dict]

Examples

>>> from haive.core.engine.retriever import TFIDFRetrieverConfig
>>> from langchain_core.documents import Document
>>>
>>> # Create documents
>>> docs = [
...     Document(page_content="Machine learning algorithms analyze data"),
...     Document(page_content="Deep learning networks process information"),
...     Document(page_content="Natural language models understand text")
... ]
>>>
>>> # Create the TF-IDF retriever config
>>> config = TFIDFRetrieverConfig(
...     name="tfidf_retriever",
...     documents=docs,
...     k=2,
...     tfidf_params={"max_features": 1000, "stop_words": "english"}
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("machine learning data analysis")

get_input_fields()[source]¶

Return input field definitions for TF-IDF retriever.

Return type:: dict[str, tuple[type, Any]]

get_output_fields()[source]¶

Return output field definitions for TF-IDF retriever.

Return type:: dict[str, tuple[type, Any]]

instantiate()[source]¶

Create a TF-IDF retriever from this configuration.

Returns:

Instantiated retriever ready for text ranking.

Return type:

TFIDFRetriever

Raises:

ImportError – If required packages are not available.
ValueError – If documents list is empty.