haive.core.engine.document.configΒΆ

Enhanced Document Engine Configuration.

This module provides comprehensive configuration models for the document engine, integrating with the existing Haive engine framework while adding enhanced functionality for document loading, processing, and management.

ClassesΒΆ

ChunkingStrategy

Strategy for document chunking.

DocumentChunk

Model for a document chunk.

DocumentEngineConfig

Enhanced configuration for the document engine.

DocumentFormat

Document format classification.

DocumentInput

Input model for document operations.

DocumentOutput

Output model for document operations.

DocumentSourceType

Document source type classification.

LoaderPreference

Preference for loader selection when multiple are available.

ProcessedDocument

Model for a processed document with chunks.

ProcessingStrategy

Strategy for document processing.

Module ContentsΒΆ

class haive.core.engine.document.config.ChunkingStrategy[source]ΒΆ

Bases: str, enum.Enum

Strategy for document chunking.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.config.DocumentChunk(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel

Model for a document chunk.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

class haive.core.engine.document.config.DocumentEngineConfig(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel

Enhanced configuration for the document engine.

This configuration extends the basic document loader config with enhanced processing capabilities, chunking strategies, and integration options.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

classmethod validate_chunk_overlap(v, info)[source]ΒΆ

Validate that chunk overlap is less than chunk size.

Parameters:

v (int)

Return type:

int

class haive.core.engine.document.config.DocumentFormat[source]ΒΆ

Bases: str, enum.Enum

Document format classification.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.config.DocumentInput(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel

Input model for document operations.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

class haive.core.engine.document.config.DocumentOutput(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel

Output model for document operations.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

update_statistics()[source]ΒΆ

Update statistics based on processed documents.

Return type:

Self

class haive.core.engine.document.config.DocumentSourceType[source]ΒΆ

Bases: str, enum.Enum

Document source type classification.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.config.LoaderPreference[source]ΒΆ

Bases: str, enum.Enum

Preference for loader selection when multiple are available.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.config.ProcessedDocument(/, **data)[source]ΒΆ

Bases: pydantic.BaseModel

Model for a processed document with chunks.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

update_statistics()[source]ΒΆ

Update statistics based on content and chunks.

Return type:

Self

class haive.core.engine.document.config.ProcessingStrategy[source]ΒΆ

Bases: str, enum.Enum

Strategy for document processing.

Initialize self. See help(type(self)) for accurate signature.