agents.document_modifiers.tnt.state

State management for taxonomy generation workflow.

This module defines the state schema used throughout the taxonomy generation process. It provides a structured way to track documents, their groupings into minibatches, and the evolution of taxonomy clusters over multiple iterations.

Examples

Basic usage of the state class:

state = TaxonomyGenerationState(
    documents=[Doc(id="1", content="text")],
    minibatches=[[0]],
    clusters=[[{"id": 1, "name": "Category"}]]
)

Classes

TaxonomyGenerationState

Represents the state passed between graph nodes in the taxonomy generation process.

Module Contents

class agents.document_modifiers.tnt.state.TaxonomyGenerationState(/, **data)

Bases: pydantic.BaseModel

Represents the state passed between graph nodes in the taxonomy generation process.

This class maintains the complete state of the taxonomy generation workflow, tracking raw documents, their organization into processing batches, and the history of taxonomy revisions.

Parameters:

data (Any)

documents

List of document objects, each containing: - id: Unique identifier - content: Raw text - summary: Generated summary (added in first step) - explanation: Summary explanation (added in first step) - category: Assigned taxonomy category (added later)

Type:

List[Doc]

minibatches

Groups of document indices for batch processing. Each inner list contains indices referencing documents in the documents list.

Type:

List[List[int]]

clusters

History of taxonomy revisions. Each revision is a list of cluster dictionaries containing: - id: Cluster identifier - name: Category name - description: Category description

Type:

List[List[dict]]

Examples

>>> docs = [Doc(id="1", content="text")]
>>> state = TaxonomyGenerationState(
...     documents=docs,
...     minibatches=[[0]],
...     clusters=[[{"id": 1, "name": "Tech", "description": "Technology"}]]
... )

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_documents(documents)

Initialize state from a list of LangChain Document objects.

Parameters:

documents (list[langchain_core.documents.Document])

Return type:

TaxonomyGenerationState