agents.document_modifiers.base.state

Base state schema for document modification agents.

from typing import Any This module defines the DocumentModifierState class which serves as the foundation for all document processing agents in the haive framework.

Classes

DocumentModifierState

Base state schema for document modification agents.

Module Contents

class agents.document_modifiers.base.state.DocumentModifierState(/, **data)

Bases: haive.core.schema.StateSchema

Base state schema for document modification agents.

This class provides the core state management for all document processing operations. It handles document collections, provides computed properties for common operations, and includes validation to ensure data integrity.

The state maintains a list of documents and provides utilities for: - Accessing combined document text - Counting documents - Adding/removing documents - Validating document collections

name

Optional identifier for this document modifier instance.

description

Optional description of the modifier’s purpose.

documents

List of Document objects to be processed.

Properties:

documents_text: Combined text content of all documents. num_documents: Total count of documents in the collection.

Example

Creating and using document state:

>>> from langchain_core.documents import Document
>>> docs = [Document(page_content="Hello"), Document(page_content="World")]
>>> state = DocumentModifierState.from_documents(docs)
>>> print(state.documents_text)
'Hello\\nWorld'
>>> print(state.num_documents)
2

Adding documents dynamically:

>>> new_doc = Document(page_content="New content")
>>> state.documents.append(new_doc)
>>> print(state.num_documents)
3
Raises:

ValueError – If no documents are provided (empty list).

Parameters:

data (Any)

Note

The state automatically validates that at least one document is present to prevent processing empty collections.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod add_document(document)

Add a single document to the state.

Note: This method has issues with the class method implementation. Consider using instance methods instead for document manipulation.

Parameters:

document (langchain_core.documents.Document) – Document to add to the collection.

Returns:

New state instance with the document added.

Return type:

DocumentModifierState

classmethod add_documents(documents)

Add multiple documents to the state.

Note: This method has issues with the class method implementation. Consider using instance methods instead for document manipulation.

Parameters:

documents (list[langchain_core.documents.Document]) – List of documents to add.

Returns:

New state instance with documents added.

Return type:

DocumentModifierState

classmethod from_documents(documents)

Create a DocumentModifierState from a list of documents.

This is a convenience factory method for creating state instances when you already have a collection of documents.

Parameters:

documents (list[langchain_core.documents.Document]) – List of Document objects to initialize the state with.

Returns:

New DocumentModifierState instance containing the provided documents.

Raises:

ValueError – If the documents list is empty.

Return type:

DocumentModifierState

Example

>>> docs = [Document(page_content="Content 1"), Document(page_content="Content 2")]
>>> state = DocumentModifierState.from_documents(docs)
>>> print(state.num_documents)
2
classmethod remove_document(document)

Remove a specific document from the state.

Note: This method has issues with the class method implementation. Consider using instance methods instead for document manipulation.

Parameters:

document (langchain_core.documents.Document) – Document to remove from the collection.

Returns:

New state instance with the document removed.

Return type:

DocumentModifierState

classmethod remove_documents(documents)

Remove multiple documents from the state.

Note: This method has issues with the class method implementation. Consider using instance methods instead for document manipulation.

Parameters:

documents (list[langchain_core.documents.Document]) – List of documents to remove.

Returns:

New state instance with documents removed.

Return type:

DocumentModifierState

validate_documents()

Validate that at least one document is present.

This validator runs after model initialization to ensure the state contains at least one document for processing.

Returns:

Self if validation passes.

Raises:

ValueError – If documents list is empty.

Return type:

DocumentModifierState

classmethod validate_documents_field(v)

Validate the documents field during assignment.

Parameters:

v – The documents list being validated.

Returns:

The validated documents list.

Return type:

Any

Note

This validator ensures type safety but allows empty lists during field assignment. The model validator handles the non-empty requirement.

property documents_text: str

Get the combined text content of all documents.

This property concatenates the page_content of all documents in the collection, separated by newlines. Useful for operations that need to process all document text at once.

Returns:

String containing all document texts joined by newlines.

Return type:

str

Example

>>> state.documents = [Document(page_content="First"), Document(page_content="Second")]
>>> print(state.documents_text)
'First\\nSecond'
property num_documents: int

Get the total number of documents in the collection.

Returns:

Integer count of documents currently in the state.

Return type:

int

Example

>>> print(f"Processing {state.num_documents} documents")
Processing 5 documents