haive.core.engine.retriever.providers.TFIDFRetrieverConfigΒΆ

TF-IDF Retriever implementation for the Haive framework.

from typing import Any This module provides a configuration class for the TF-IDF (Term Frequency-Inverse Document Frequency) retriever, which uses classical TF-IDF scoring for document retrieval. TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection of documents.

The TFIDFRetriever works by: 1. Computing term frequency (TF) for each term in each document 2. Computing inverse document frequency (IDF) for each term across the corpus 3. Calculating TF-IDF scores as the product of TF and IDF 4. Ranking documents by their TF-IDF similarity to the query

This retriever is particularly useful when: - Working with text-based document collections - Need classical information retrieval approaches - Want interpretable term-based ranking - Building baseline retrieval systems - Comparing against modern neural approaches

The implementation integrates with LangChain’s TFIDFRetriever while providing a consistent Haive configuration interface.

ClassesΒΆ

TFIDFRetrieverConfig

Configuration for TF-IDF retriever in the Haive framework.

Module ContentsΒΆ

class haive.core.engine.retriever.providers.TFIDFRetrieverConfig.TFIDFRetrieverConfig[source]ΒΆ

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for TF-IDF retriever in the Haive framework.

This retriever uses Term Frequency-Inverse Document Frequency scoring to rank documents based on the importance of query terms in the document collection.

retriever_typeΒΆ

The type of retriever (always TFIDF).

Type:

RetrieverType

documentsΒΆ

Documents to index for retrieval.

Type:

List[Document]

kΒΆ

Number of documents to retrieve (default: 4).

Type:

int

tfidf_paramsΒΆ

Additional parameters for TF-IDF computation.

Type:

Optional[Dict]

Examples

>>> from haive.core.engine.retriever import TFIDFRetrieverConfig
>>> from langchain_core.documents import Document
>>>
>>> # Create documents
>>> docs = [
...     Document(page_content="Machine learning algorithms analyze data"),
...     Document(page_content="Deep learning networks process information"),
...     Document(page_content="Natural language models understand text")
... ]
>>>
>>> # Create the TF-IDF retriever config
>>> config = TFIDFRetrieverConfig(
...     name="tfidf_retriever",
...     documents=docs,
...     k=2,
...     tfidf_params={"max_features": 1000, "stop_words": "english"}
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("machine learning data analysis")
get_input_fields()[source]ΒΆ

Return input field definitions for TF-IDF retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]ΒΆ

Return output field definitions for TF-IDF retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]ΒΆ

Create a TF-IDF retriever from this configuration.

Returns:

Instantiated retriever ready for text ranking.

Return type:

TFIDFRetriever

Raises:
  • ImportError – If required packages are not available.

  • ValueError – If documents list is empty.