haive.core.engine.document.path_analysis¶
Path Analysis System for Document Loader Engine.
This module provides a path analysis system for the document loader engine, which analyzes paths and URLs to determine their nature and properties.
Classes¶
Cloud storage provider classification. |
|
Database type classification. |
|
Information about a domain. |
|
High-level file category. |
|
Result of path analysis. |
|
Primary path type classification. |
|
Components of a URL. |
Functions¶
|
Analyze a cloud storage path. |
|
Analyze a database URI. |
|
Analyze a local filesystem path. |
|
Analyze a network share path. |
Analyze a path comprehensively. |
|
|
Analyze a special path (e.g., git SSH URL). |
|
Analyze a URL. |
|
Detect the encoding of a text file. |
|
Detect the MIME type of a file. |
|
Extract domain information from URL components. |
Extract components from a URL. |
|
|
Check if a file is binary. |
Module Contents¶
- class haive.core.engine.document.path_analysis.CloudProvider[source]¶
-
Cloud storage provider classification.
Initialize self. See help(type(self)) for accurate signature.
- class haive.core.engine.document.path_analysis.DatabaseType[source]¶
-
Database type classification.
Initialize self. See help(type(self)) for accurate signature.
- class haive.core.engine.document.path_analysis.DomainInfo(/, **data)[source]¶
Bases:
pydantic.BaseModel
Information about a domain.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- class haive.core.engine.document.path_analysis.FileCategory[source]¶
-
High-level file category.
Initialize self. See help(type(self)) for accurate signature.
- class haive.core.engine.document.path_analysis.PathAnalysisResult(/, **data)[source]¶
Bases:
pydantic.BaseModel
Result of path analysis.
This model contains comprehensive information about a path, including its type, properties, and metadata.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- class haive.core.engine.document.path_analysis.PathType[source]¶
-
Primary path type classification.
Initialize self. See help(type(self)) for accurate signature.
- class haive.core.engine.document.path_analysis.URLComponents(/, **data)[source]¶
Bases:
pydantic.BaseModel
Components of a URL.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- haive.core.engine.document.path_analysis.analyze_cloud_path(path)[source]¶
Analyze a cloud storage path.
- Parameters:
path (str) – Cloud storage path to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.analyze_database_uri(uri)[source]¶
Analyze a database URI.
- Parameters:
uri (str) – Database URI to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.analyze_local_path(path)[source]¶
Analyze a local filesystem path.
- Parameters:
path (str) – Path to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.analyze_network_path(path)[source]¶
Analyze a network share path.
- Parameters:
path (str) – Network path to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.analyze_path_comprehensive(path)[source]¶
Analyze a path comprehensively.
This function analyzes a path to determine its type, properties, and metadata. It handles various path types including local files, URLs, database URIs, and cloud storage paths.
- Parameters:
path (str | pathlib.Path) – Path to analyze (string or Path object)
- Returns:
PathAnalysisResult object with comprehensive information about the path
- Return type:
- haive.core.engine.document.path_analysis.analyze_special_path(path)[source]¶
Analyze a special path (e.g., git SSH URL).
- Parameters:
path (str) – Special path to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.analyze_url(url)[source]¶
Analyze a URL.
- Parameters:
url (str) – URL to analyze
- Returns:
PathAnalysisResult object
- Return type:
- haive.core.engine.document.path_analysis.detect_encoding(file_path)[source]¶
Detect the encoding of a text file.
- haive.core.engine.document.path_analysis.detect_mime_type(file_path)[source]¶
Detect the MIME type of a file.
- haive.core.engine.document.path_analysis.extract_domain_info(url_components)[source]¶
Extract domain information from URL components.
- Parameters:
url_components (URLComponents) – URLComponents object
- Returns:
DomainInfo object
- Return type: