haive.core.engine.retriever.providers.SelfQueryRetrieverConfigΒΆ

Self-Query Retriever implementation for the Haive framework.

This module provides a configuration class for the Self-Query retriever, which enables natural language queries to be converted into structured queries that can filter on document metadata and perform semantic similarity search.

The SelfQueryRetriever works by: 1. Using an LLM to parse natural language queries into structured components 2. Extracting filter conditions for metadata (date, category, etc.) 3. Extracting the semantic search query component 4. Performing both metadata filtering and vector similarity search 5. Returning documents that match both criteria

This retriever is particularly useful when: - Documents have rich metadata that should be queryable - Need to combine semantic search with structured filtering - Users want to query both content and attributes naturally - Building systems that need precise control over search scope

The implementation integrates with LangChain’s SelfQueryRetriever while providing a consistent Haive configuration interface with metadata schema support.

ClassesΒΆ

SelfQueryRetrieverConfig

Configuration for Self-Query retriever in the Haive framework.

Module ContentsΒΆ

class haive.core.engine.retriever.providers.SelfQueryRetrieverConfig.SelfQueryRetrieverConfig[source]ΒΆ

Bases: haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Self-Query retriever in the Haive framework.

This retriever converts natural language queries into structured queries that can filter on document metadata and perform semantic similarity search.

retriever_typeΒΆ

The type of retriever (always SELF_QUERY).

Type:

RetrieverType

vectorstore_configΒΆ

Vector store for semantic search.

Type:

VectorStoreConfig

llm_configΒΆ

LLM for parsing natural language queries.

Type:

AugLLMConfig

document_content_descriptionΒΆ

Description of document content for LLM.

Type:

str

metadata_field_infoΒΆ

Metadata fields that can be filtered on.

Type:

List[Dict]

kΒΆ

Number of documents to return.

Type:

int

Examples

>>> from haive.core.engine.retriever import SelfQueryRetrieverConfig
>>> from haive.core.engine.vectorstore.providers.ChromaVectorStoreConfig import ChromaVectorStoreConfig
>>> from haive.core.engine.aug_llm import AugLLMConfig
>>>
>>> # Create vector store and LLM configs
>>> vs_config = ChromaVectorStoreConfig(name="docs", collection_name="documents")
>>> llm_config = AugLLMConfig(model_name="gpt-3.5-turbo", provider="openai")
>>>
>>> # Define metadata schema
>>> metadata_fields = [
...     {
...         "name": "genre",
...         "description": "The genre of the movie",
...         "type": "string"
...     }
... ]
>>>
>>> # Create self-query retriever
>>> config = SelfQueryRetrieverConfig(
...     name="self_query_retriever",
...     vectorstore_config=vs_config,
...     llm_config=llm_config,
...     document_content_description="Movie reviews and summaries",
...     metadata_field_info=metadata_fields,
...     k=5
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("action movies from the 1990s")
get_input_fields()[source]ΒΆ

Return input field definitions for Self-Query retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]ΒΆ

Return output field definitions for Self-Query retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]ΒΆ

Create a Self-Query retriever from this configuration.

Returns:

Instantiated retriever ready for self-query retrieval.

Return type:

SelfQueryRetriever

Raises:
  • ImportError – If required packages are not available.

  • ValueError – If configuration is invalid.

classmethod validate_metadata_field_info(v)[source]ΒΆ

Validate metadata field info structure.