haive.core.engine.retriever.providers.WebResearchRetrieverConfigΒΆ

from typing import Any. Web Research Retriever implementation for the Haive framework.

This module provides a configuration class for the Web Research retriever, which performs advanced web research by combining web search with document processing and retrieval. It searches the web, retrieves content from URLs, processes the content, and provides comprehensive research results.

The WebResearchRetriever works by: 1. Using a web search API to find relevant URLs 2. Retrieving and processing content from those URLs 3. Chunking and embedding the retrieved content 4. Providing retrieval over the processed web content 5. Combining search results with retrieved document chunks

This retriever is particularly useful when: - Need up-to-date information from the web - Building research applications that require current data - Combining web search with document retrieval - Creating systems that need comprehensive web coverage - Building fact-checking or research assistant applications

The implementation integrates with LangChain’s WebResearchRetriever while providing a consistent Haive configuration interface with secure API key management.

ClassesΒΆ

WebResearchRetrieverConfig

Configuration for Web Research retriever in the Haive framework.

Module ContentsΒΆ

class haive.core.engine.retriever.providers.WebResearchRetrieverConfig.WebResearchRetrieverConfig[source]ΒΆ

Bases: haive.core.common.mixins.secure_config.SecureConfigMixin, haive.core.engine.retriever.retriever.BaseRetrieverConfig

Configuration for Web Research retriever in the Haive framework.

This retriever performs comprehensive web research by searching the web, retrieving content, and providing retrieval capabilities over the collected data.

retriever_typeΒΆ

The type of retriever (always WEB_RESEARCH).

Type:

RetrieverType

vectorstore_configΒΆ

Vector store for indexing web content.

Type:

VectorStoreConfig

llm_configΒΆ

LLM for processing and summarization.

Type:

AugLLMConfig

api_keyΒΆ

API key for web search (auto-resolved).

Type:

Optional[SecretStr]

num_search_resultsΒΆ

Number of web search results to process.

Type:

int

num_web_pagesΒΆ

Number of web pages to retrieve content from.

Type:

int

chunk_sizeΒΆ

Size of text chunks for processing.

Type:

int

chunk_overlapΒΆ

Overlap between text chunks.

Type:

int

Examples

>>> from haive.core.engine.retriever import WebResearchRetrieverConfig
>>> from haive.core.engine.aug_llm import AugLLMConfig
>>> from haive.core.engine.vectorstore.providers.ChromaVectorStoreConfig import ChromaVectorStoreConfig
>>>
>>> # Configure components
>>> llm_config = AugLLMConfig(model_name="gpt-4", provider="openai")
>>> vectorstore_config = ChromaVectorStoreConfig(
...     name="web_research_store",
...     collection_name="web_content"
... )
>>>
>>> # Create the web research retriever config
>>> config = WebResearchRetrieverConfig(
...     name="web_research_retriever",
...     vectorstore_config=vectorstore_config,
...     llm_config=llm_config,
...     num_search_results=10,
...     num_web_pages=5
... )
>>>
>>> # Instantiate and use the retriever
>>> retriever = config.instantiate()
>>> docs = retriever.get_relevant_documents("latest AI research developments 2024")
get_input_fields()[source]ΒΆ

Return input field definitions for Web Research retriever.

Return type:

dict[str, tuple[type, Any]]

get_output_fields()[source]ΒΆ

Return output field definitions for Web Research retriever.

Return type:

dict[str, tuple[type, Any]]

instantiate()[source]ΒΆ

Create a Web Research retriever from this configuration.

Returns:

Instantiated retriever ready for web research.

Return type:

WebResearchRetriever

Raises:
  • ImportError – If required packages are not available.

  • ValueError – If API key or configuration is invalid.