haive.core.engine.document.factory

Auto Loader Factory for Document Engine.

This module provides a comprehensive factory interface that can analyze any path or URL and automatically select the appropriate document source and loader.

Classes

AutoLoaderFactory

Factory for automatically creating document loaders.

Functions

analyze_source(path)

Analyze a source path and return detailed information.

create_document_loader(path[, strategy, ...])

Convenience function to create a document loader.

Module Contents

class haive.core.engine.document.factory.AutoLoaderFactory(credential_manager=None)

Factory for automatically creating document loaders.

Initialize the factory.

Parameters:

credential_manager (haive.core.engine.document.loaders.sources.implementation.CredentialManager | None) – Optional credential manager for authenticated sources

analyze_path(path)

Analyze a path to understand its properties.

Parameters:

path (str)

Return type:

haive.core.engine.document.path_analysis.PathAnalysisResult | None

create_loader(path, strategy=None, options=None, preferences=None)

Create the appropriate document loader for any path or URL.

This factory function analyzes the given path to determine its nature (file, URL, database URI, etc.) and returns the appropriate loader instance.

Parameters:
  • path (str) – File path, URL, or URI to load

  • strategy (str | None) – Optional specific strategy to use (e.g., ‘pdf_pymupdf’, ‘playwright’)

  • options (dict[str, Any] | None) – Optional loader-specific options

  • preferences (dict[str, Any] | None) – Optional preferences for loader selection

Returns:

DocumentLoader instance appropriate for the given path

Return type:

langchain_core.document_loaders.base.BaseLoader | None

Examples

>>> factory = AutoLoaderFactory()
>>>
>>> # Load a PDF file with OCR
>>> loader = factory.create_loader("path/to/document.pdf", strategy="pdf_pymupdf")
>>>
>>> # Load a webpage with JavaScript support
>>> loader = factory.create_loader("https://example.com", strategy="playwright")
>>>
>>> # Auto-select best loader for any source
>>> loader = factory.create_loader("path/to/document.docx")
get_available_strategies()

Get list of available loader strategies.

Return type:

list[str]

get_supported_sources()

Get list of supported source types.

Return type:

list[haive.core.engine.document.loaders.sources.implementation.SourceType]

haive.core.engine.document.factory.analyze_source(path)

Analyze a source path and return detailed information.

Parameters:

path (str) – Path to analyze

Returns:

Dictionary with analysis results or None if analysis failed

Return type:

dict[str, Any] | None

haive.core.engine.document.factory.create_document_loader(path, strategy=None, credential_manager=None, options=None, preferences=None)

Convenience function to create a document loader.

Parameters:
Returns:

DocumentLoader instance or None if creation failed

Return type:

langchain_core.document_loaders.base.BaseLoader | None