mcp.utils.extract_mcp_github_repos¶

Enhanced MCP Repository Extractor with README Processing.

This script: 1. Extracts repository URLs from awesome-mcp-servers 2. Downloads and processes README files 3. Converts to LangChain Documents with metadata 4. Organizes resources for agent access

Attributes¶

console

Classes¶

`ExtractionStats`	Statistics for extraction process.
`MCPCategory`	MCP Server Categories.
`MCPLanguage`	Programming Languages.
`MCPPlatform`	Supported Platforms.
`MCPRepositoryExtractor`	Enhanced MCP Repository Extractor.
`MCPScope`	Server Scope.
`MCPServerDocument`	Complete MCP Server Document.
`MCPServerMetadata`	Metadata for an MCP Server.

Functions¶

`create_agent_loader`(→ callable)	Create a loader function for agents to access MCP documents.
`main`()	Main function.

Module Contents¶

class mcp.utils.extract_mcp_github_repos.ExtractionStats(/, **data: Any)¶

Bases: pydantic.BaseModel

Statistics for extraction process.

categories: dict[str, int] = None¶

extraction_duration: float | None = None¶

failed_extractions: int = 0¶

languages: dict[str, int] = None¶

successfully_extracted: int = 0¶

total_found: int = 0¶

class mcp.utils.extract_mcp_github_repos.MCPCategory¶

Bases: str, enum.Enum

MCP Server Categories.

AGGREGATORS = 'Aggregators'¶

AI_SERVICES = 'AI Services'¶

ART_LITERATURE = 'Art & Literature'¶

CLOUD_PLATFORMS = 'Cloud Platforms'¶

CLOUD_STORAGE = 'Cloud Storage'¶

COMMUNICATION = 'Communication'¶

DATABASES = 'Databases'¶

DATA_VISUALIZATION = 'Data Visualization'¶

DEVELOPMENT_TOOLS = 'Development Tools'¶

FILE_SYSTEMS = 'File Systems'¶

FINANCE = 'Finance'¶

GAMING = 'Gaming'¶

IDENTITY = 'Identity'¶

IOT = 'IoT'¶

LANGUAGE_TRANSLATION = 'Language & Translation'¶

LOCATION_SERVICES = 'Location Services'¶

MARKETING = 'Marketing'¶

MONITORING = 'Monitoring'¶

NOTE_TAKING = 'Note Taking'¶

OTHER = 'Other'¶

RESEARCH_DATA = 'Research & Data'¶

SANDBOX_VIRTUALIZATION = 'Sandbox & Virtualization'¶

SEARCH_WEB = 'Search & Web'¶

SECURITY = 'Security'¶

SOCIAL_MEDIA = 'Social Media'¶

SYSTEM_AUTOMATION = 'System Automation'¶

VERSION_CONTROL = 'Version Control'¶

WORKFLOW_AUTOMATION = 'Workflow Automation'¶

class mcp.utils.extract_mcp_github_repos.MCPLanguage¶

Bases: str, enum.Enum

Programming Languages.

CSHARP = 'C#'¶

C_CPP = 'C/C++'¶

GO = 'Go'¶

JAVA = 'Java'¶

OTHER = 'Other'¶

PYTHON = 'Python'¶

RUST = 'Rust'¶

TYPESCRIPT_JAVASCRIPT = 'TypeScript/JavaScript'¶

class mcp.utils.extract_mcp_github_repos.MCPPlatform¶

Bases: str, enum.Enum

Supported Platforms.

CROSS_PLATFORM = 'Cross-Platform'¶

LINUX = 'Linux'¶

MACOS = 'macOS'¶

WINDOWS = 'Windows'¶

class mcp.utils.extract_mcp_github_repos.MCPRepositoryExtractor(output_dir: str = 'agent_resources/mcp_servers')¶

Enhanced MCP Repository Extractor.

async extract_all() → list[MCPServerDocument]¶: Main extraction method.

async extract_repositories_from_readme() → list[MCPServerMetadata]¶: Extract repository information from the awesome-mcp-servers. README.

async fetch_github_metadata(metadata: MCPServerMetadata) → None¶: Fetch additional metadata from GitHub API.

async fetch_readme_content(metadata: MCPServerMetadata) → str | None¶: Fetch README content from GitHub.

generate_statistics_report(documents: list[MCPServerDocument]) → None¶: Generate statistics report.

async process_repository(metadata: MCPServerMetadata) → MCPServerDocument | None¶: Process a single repository.

save_documents(documents: list[MCPServerDocument]) → None¶: Save documents in various formats.

category_mappings¶

docs_dir¶

language_indicators¶

metadata_dir¶

output_dir¶

platform_indicators¶

raw_dir¶

scope_indicators¶

session = None¶

source_url = 'https://github.com/TensorBlock/awesome-mcp-servers'¶

stats¶

class mcp.utils.extract_mcp_github_repos.MCPScope¶

Bases: str, enum.Enum

Server Scope.

CLOUD = 'cloud'¶

EMBEDDED = 'embedded'¶

LOCAL = 'local'¶

class mcp.utils.extract_mcp_github_repos.MCPServerDocument(/, **data: Any)¶

Bases: pydantic.BaseModel

Complete MCP Server Document.

compute_content_hash() → str¶: Compute SHA256 hash of README content.

to_langchain_document() → langchain_core.documents.Document¶: Convert to LangChain Document.

content_hash: str | None = None¶

extracted_at: datetime.datetime = None¶

metadata: MCPServerMetadata = None¶

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

readme_content: str | None = None¶

class mcp.utils.extract_mcp_github_repos.MCPServerMetadata(/, **data: Any)¶

Bases: pydantic.BaseModel

Metadata for an MCP Server.

get_unique_id() → str¶: Generate unique ID for this server.

to_langchain_metadata() → dict[str, Any]¶: Convert to LangChain Document metadata format.

classmethod validate_repo_url(v: str) → str¶: Validate GitHub repository URL.

api_base_url: str | None = None¶

category: MCPCategory = None¶

description: str | None = None¶

is_official: bool = None¶

languages: list[MCPLanguage] = None¶

last_updated: datetime.datetime | None = None¶

license: str | None = None¶

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str = None¶

owner: str = None¶

platforms: list[MCPPlatform] = None¶

readme_url: str | None = None¶

repo_name: str = None¶

repo_url: str = None¶

scopes: list[MCPScope] = None¶

stars: int | None = None¶

mcp.utils.extract_mcp_github_repos.create_agent_loader(output_dir: str = 'agent_resources/mcp_servers') → callable¶: Create a loader function for agents to access MCP documents.

async mcp.utils.extract_mcp_github_repos.main()¶: Main function.

mcp.utils.extract_mcp_github_repos.console¶