haive.core.engine.document.loaders.registry

Document loader registry system.

This module provides a registry for document loaders, allowing them to be registered, looked up, and managed throughout the application.

Classes

DocumentLoaderRegistry

Registry for document loaders.

LoaderMetadata

Metadata for a document loader.

Functions

create_loader(loader_name, **kwargs)

Create a loader instance by name.

get_default_registry()

Get the default document loader registry.

get_loader(loader_name)

Get a loader by name from the default registry.

register_loader(source_type[, name, description, ...])

Decorator to register a document loader.

Module Contents

class haive.core.engine.document.loaders.registry.DocumentLoaderRegistry[source]

Bases: haive.core.registry.base.AbstractRegistry[type[langchain_core.document_loaders.base.BaseLoader]]

Registry for document loaders.

This registry keeps track of document loader classes and their metadata, allowing for discovery and instantiation of loaders based on source types.

Initialize the registry with empty storage.

clear()[source]

Clear all registrations.

Return type:

None

find_by_id(id)[source]

Find a loader by name (used for compatibility with AbstractRegistry).

Parameters:

id (str) – Loader name

Returns:

Loader class if found, None otherwise

Return type:

type[langchain_core.document_loaders.base.BaseLoader] | None

find_by_name(name)[source]

Find a loader by name.

Parameters:

name (str) – Loader name

Returns:

Loader class if found, None otherwise

Return type:

type[langchain_core.document_loaders.base.BaseLoader] | None

find_loader_for_file(file_path)[source]

Find loaders that can handle a specific file extension.

Parameters:

file_path (str) – Path to the file

Returns:

List of loader classes that can handle this file

Return type:

list[type[langchain_core.document_loaders.base.BaseLoader]]

find_loader_for_url(url)[source]

Find loaders that can handle a specific URL pattern.

Parameters:

url (str) – URL to handle

Returns:

List of loader classes that can handle this URL

Return type:

list[type[langchain_core.document_loaders.base.BaseLoader]]

get(item_type, name)[source]

Get a loader by source type and name.

Parameters:
Returns:

Loader class if found, None otherwise

Return type:

type[langchain_core.document_loaders.base.BaseLoader] | None

get_all(item_type)[source]

Get all loaders for a specific source type.

Parameters:

item_type (haive.core.engine.document.loaders.sources.source_types.SourceCategory) – Source type

Returns:

Dictionary mapping loader names to loader classes

Return type:

dict[str, type[langchain_core.document_loaders.base.BaseLoader]]

get_all_metadata()[source]

Get metadata for all registered loaders.

Returns:

Dictionary mapping loader names to metadata

Return type:

dict[str, LoaderMetadata]

classmethod get_instance()[source]

Get the singleton instance of the registry.

Return type:

DocumentLoaderRegistry

get_metadata(name)[source]

Get metadata for a specific loader.

Parameters:

name (str) – Loader name

Returns:

Loader metadata if found, None otherwise

Return type:

LoaderMetadata | None

list(item_type)[source]

List all loader names for a specific source type.

Parameters:

item_type (haive.core.engine.document.loaders.sources.source_types.SourceCategory) – Source type

Returns:

List of loader names

Return type:

list[str]

register(loader_class, metadata)[source]

Register a document loader with metadata.

Parameters:
  • loader_class (type[langchain_core.document_loaders.base.BaseLoader]) – Loader class to register

  • metadata (LoaderMetadata) – Metadata for the loader

Returns:

The registered loader class

Return type:

type[langchain_core.document_loaders.base.BaseLoader]

class haive.core.engine.document.loaders.registry.LoaderMetadata(/, **data)[source]

Bases: pydantic.BaseModel

Metadata for a document loader.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

haive.core.engine.document.loaders.registry.create_loader(loader_name, **kwargs)[source]

Create a loader instance by name.

Parameters:

loader_name (str)

Return type:

langchain_core.document_loaders.base.BaseLoader | None

haive.core.engine.document.loaders.registry.get_default_registry()[source]

Get the default document loader registry.

Return type:

DocumentLoaderRegistry

haive.core.engine.document.loaders.registry.get_loader(loader_name)[source]

Get a loader by name from the default registry.

Parameters:

loader_name (str)

Return type:

type[langchain_core.document_loaders.base.BaseLoader] | None

haive.core.engine.document.loaders.registry.register_loader(source_type, name=None, description=None, requires_async=False, file_extensions=None, url_patterns=None, config_schema=None)[source]

Decorator to register a document loader.

Parameters:
  • source_type (haive.core.engine.document.loaders.sources.source_types.SourceCategory) – Type of source this loader handles

  • name (str | None) – Optional custom name for the loader

  • description (str | None) – Optional description of the loader

  • requires_async (bool) – Whether this loader requires async operations

  • file_extensions (list[str] | None) – List of file extensions this loader can handle

  • url_patterns (list[str] | None) – List of URL patterns this loader can handle

  • config_schema (type[pydantic.BaseModel] | None) – Optional Pydantic model for configuration

Returns:

Decorator function

Return type:

collections.abc.Callable[[type[langchain_core.document_loaders.base.BaseLoader]], type[langchain_core.document_loaders.base.BaseLoader]]