haive.core.engine.document.loaders¶
Haive Document Loaders - Ultimate Auto-Loading System.
This module provides the world’s most comprehensive document loading system with support for 230+ langchain_community document loaders. It can automatically detect, configure, and load documents from ANY source type.
🚀 Features: - Auto-Detection: Automatically detects source type from paths/URLs - 230+ Loaders: Complete langchain_community loader support - Smart Registry: Intelligent loader selection based on preferences - Bulk Loading: Concurrent processing with progress tracking - Error Handling: Built-in retry logic and graceful error handling - Async Support: Full async/await support for high-performance scenarios
📁 Supported Sources: - Local Files: PDF, DOCX, CSV, JSON, code files, archives, etc. - Web Sources: Websites, APIs, documentation sites, social media - Databases: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, etc. - Cloud Storage: S3, GCS, Azure Blob, Google Drive, Dropbox, etc. - Business Platforms: Salesforce, HubSpot, Zendesk, Jira, etc. - Communication: Slack, Discord, Teams, email, forums, etc. - Specialized: Government, healthcare, education, finance, etc.
- 💡 Quick Start:
from haive.core.engine.document.loaders import AutoLoader
# Ultimate auto-loader - works with ANY source loader = AutoLoader()
# Load from anywhere docs = loader.load(“document.pdf”) # Local file docs = loader.load(”https://docs.site.com”) # Website docs = loader.load(“s3://bucket/docs/”) # Cloud storage docs = loader.load(“postgres://db/table”) # Database
# Load documents from multiple sources (standard langchain method) docs = loader.load_documents([
“file1.pdf”, “file2.txt”, “https://site.com”
])
# Bulk loading with detailed results sources = [“file1.pdf”, “https://site.com”, “s3://bucket/”] result = loader.load_bulk(sources)
# Load everything from a source docs = loader.load_all(“/documents/”) # Entire directory docs = loader.load_all(”https://wiki.com”) # Entire website
- 🔧 Advanced Usage:
- from haive.core.engine.document.loaders import (
AutoLoader, AutoLoaderConfig, LoaderPreference
)
# Configure for quality vs speed config = AutoLoaderConfig(
preference=LoaderPreference.QUALITY, max_concurrency=20, enable_caching=True
) loader = AutoLoader(config)
# Async loading from single source docs = await loader.aload(”https://large-site.com”)
# Async loading from multiple sources docs = await loader.aload_documents([
“file1.pdf”, “https://site1.com”, “https://site2.com”
])
# Get detailed loading information result = loader.load_detailed(“document.pdf”) print(f”Loaded {len(result.documents)} docs in {result.loading_time:.2f}s”)
- 📊 Registry Management:
- from haive.core.engine.document.loaders import (
auto_register_all, get_registration_status, list_available_sources
)
# Auto-register all 230+ loaders stats = auto_register_all() print(f”Registered {stats.total_sources_registered} sources”)
# Check what’s available sources = list_available_sources() print(f”Available sources: {len(sources)}”)
# Get detailed status status = get_registration_status()
- ⚡ Convenience Functions:
- from haive.core.engine.document.loaders import (
load_document, load_documents_bulk, aload_document
)
# Simple one-liner loading docs = load_document(“any-source-here”)
# Bulk loading multiple sources docs = load_documents_bulk([“file1.pdf”, “file2.docx”])
# Async loading docs = await aload_document(”https://example.com”)
This system represents the ultimate evolution of document loading - from the messy legacy system to a production-ready, scalable solution that handles any document source imaginable.
Author: Claude (Haive AI Agent Framework) Version: 2.0.0 - Complete Rewrite with 230+ Loaders
Submodules¶
- haive.core.engine.document.loaders.adapters
- haive.core.engine.document.loaders.auto_factory
- haive.core.engine.document.loaders.auto_loader
- haive.core.engine.document.loaders.auto_registry
- haive.core.engine.document.loaders.base
- haive.core.engine.document.loaders.base_new
- haive.core.engine.document.loaders.cache_manager
- haive.core.engine.document.loaders.engine
- haive.core.engine.document.loaders.examples
- haive.core.engine.document.loaders.path_analyzer
- haive.core.engine.document.loaders.registry
- haive.core.engine.document.loaders.source_base
- haive.core.engine.document.loaders.sources
- haive.core.engine.document.loaders.specific
- haive.core.engine.document.loaders.strategy