haive.core.engine.document.path_analysis

Path Analysis System for Document Loader Engine.

This module provides a path analysis system for the document loader engine, which analyzes paths and URLs to determine their nature and properties.

Classes

CloudProvider

Cloud storage provider classification.

DatabaseType

Database type classification.

DomainInfo

Information about a domain.

FileCategory

High-level file category.

PathAnalysisResult

Result of path analysis.

PathType

Primary path type classification.

URLComponents

Components of a URL.

Functions

analyze_cloud_path(path)

Analyze a cloud storage path.

analyze_database_uri(uri)

Analyze a database URI.

analyze_local_path(path)

Analyze a local filesystem path.

analyze_network_path(path)

Analyze a network share path.

analyze_path_comprehensive(path)

Analyze a path comprehensively.

analyze_special_path(path)

Analyze a special path (e.g., git SSH URL).

analyze_url(url)

Analyze a URL.

detect_encoding(file_path)

Detect the encoding of a text file.

detect_mime_type(file_path)

Detect the MIME type of a file.

extract_domain_info(url_components)

Extract domain information from URL components.

extract_url_components(url)

Extract components from a URL.

is_binary_file(file_path)

Check if a file is binary.

Module Contents

class haive.core.engine.document.path_analysis.CloudProvider[source]

Bases: str, enum.Enum

Cloud storage provider classification.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.path_analysis.DatabaseType[source]

Bases: str, enum.Enum

Database type classification.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.path_analysis.DomainInfo(/, **data)[source]

Bases: pydantic.BaseModel

Information about a domain.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

class haive.core.engine.document.path_analysis.FileCategory[source]

Bases: str, enum.Enum

High-level file category.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.path_analysis.PathAnalysisResult(/, **data)[source]

Bases: pydantic.BaseModel

Result of path analysis.

This model contains comprehensive information about a path, including its type, properties, and metadata.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

source_summary()

Generate a summary of the source.

Return type:

str

class haive.core.engine.document.path_analysis.PathType[source]

Bases: str, enum.Enum

Primary path type classification.

Initialize self. See help(type(self)) for accurate signature.

class haive.core.engine.document.path_analysis.URLComponents(/, **data)[source]

Bases: pydantic.BaseModel

Components of a URL.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

haive.core.engine.document.path_analysis.analyze_cloud_path(path)[source]

Analyze a cloud storage path.

Parameters:

path (str) – Cloud storage path to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_database_uri(uri)[source]

Analyze a database URI.

Parameters:

uri (str) – Database URI to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_local_path(path)[source]

Analyze a local filesystem path.

Parameters:

path (str) – Path to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_network_path(path)[source]

Analyze a network share path.

Parameters:

path (str) – Network path to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_path_comprehensive(path)[source]

Analyze a path comprehensively.

This function analyzes a path to determine its type, properties, and metadata. It handles various path types including local files, URLs, database URIs, and cloud storage paths.

Parameters:

path (str | pathlib.Path) – Path to analyze (string or Path object)

Returns:

PathAnalysisResult object with comprehensive information about the path

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_special_path(path)[source]

Analyze a special path (e.g., git SSH URL).

Parameters:

path (str) – Special path to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.analyze_url(url)[source]

Analyze a URL.

Parameters:

url (str) – URL to analyze

Returns:

PathAnalysisResult object

Return type:

PathAnalysisResult

haive.core.engine.document.path_analysis.detect_encoding(file_path)[source]

Detect the encoding of a text file.

Parameters:

file_path (str) – Path to the file

Returns:

Encoding name, or None if unable to determine

Return type:

str | None

haive.core.engine.document.path_analysis.detect_mime_type(file_path)[source]

Detect the MIME type of a file.

Parameters:

file_path (str) – Path to the file

Returns:

MIME type string, or None if unable to determine

Return type:

str | None

haive.core.engine.document.path_analysis.extract_domain_info(url_components)[source]

Extract domain information from URL components.

Parameters:

url_components (URLComponents) – URLComponents object

Returns:

DomainInfo object

Return type:

DomainInfo

haive.core.engine.document.path_analysis.extract_url_components(url)[source]

Extract components from a URL.

Parameters:

url (str) – URL string

Returns:

URLComponents object

Return type:

URLComponents

haive.core.engine.document.path_analysis.is_binary_file(file_path)[source]

Check if a file is binary.

Parameters:

file_path (str) – Path to the file

Returns:

True if the file is binary, False otherwise

Return type:

bool