mcp.utils.extract_mcp_github_reposΒΆ

Enhanced MCP Repository Extractor with README Processing.

This script: 1. Extracts repository URLs from awesome-mcp-servers 2. Downloads and processes README files 3. Converts to LangChain Documents with metadata 4. Organizes resources for agent access

AttributesΒΆ

ClassesΒΆ

ExtractionStats

Statistics for extraction process.

MCPCategory

MCP Server Categories.

MCPLanguage

Programming Languages.

MCPPlatform

Supported Platforms.

MCPRepositoryExtractor

Enhanced MCP Repository Extractor.

MCPScope

Server Scope.

MCPServerDocument

Complete MCP Server Document.

MCPServerMetadata

Metadata for an MCP Server.

FunctionsΒΆ

create_agent_loader(β†’ callable)

Create a loader function for agents to access MCP documents.

main()

Main function.

Module ContentsΒΆ

class mcp.utils.extract_mcp_github_repos.ExtractionStats(/, **data: Any)ΒΆ

Bases: pydantic.BaseModel

Statistics for extraction process.

categories: dict[str, int] = NoneΒΆ
extraction_duration: float | None = NoneΒΆ
failed_extractions: int = 0ΒΆ
languages: dict[str, int] = NoneΒΆ
successfully_extracted: int = 0ΒΆ
total_found: int = 0ΒΆ
class mcp.utils.extract_mcp_github_repos.MCPCategoryΒΆ

Bases: str, enum.Enum

MCP Server Categories.

AGGREGATORS = 'Aggregators'ΒΆ
AI_SERVICES = 'AI Services'ΒΆ
ART_LITERATURE = 'Art & Literature'ΒΆ
CLOUD_PLATFORMS = 'Cloud Platforms'ΒΆ
CLOUD_STORAGE = 'Cloud Storage'ΒΆ
COMMUNICATION = 'Communication'ΒΆ
DATABASES = 'Databases'ΒΆ
DATA_VISUALIZATION = 'Data Visualization'ΒΆ
DEVELOPMENT_TOOLS = 'Development Tools'ΒΆ
FILE_SYSTEMS = 'File Systems'ΒΆ
FINANCE = 'Finance'ΒΆ
GAMING = 'Gaming'ΒΆ
IDENTITY = 'Identity'ΒΆ
IOT = 'IoT'ΒΆ
LANGUAGE_TRANSLATION = 'Language & Translation'ΒΆ
LOCATION_SERVICES = 'Location Services'ΒΆ
MARKETING = 'Marketing'ΒΆ
MONITORING = 'Monitoring'ΒΆ
NOTE_TAKING = 'Note Taking'ΒΆ
OTHER = 'Other'ΒΆ
RESEARCH_DATA = 'Research & Data'ΒΆ
SANDBOX_VIRTUALIZATION = 'Sandbox & Virtualization'ΒΆ
SEARCH_WEB = 'Search & Web'ΒΆ
SECURITY = 'Security'ΒΆ
SOCIAL_MEDIA = 'Social Media'ΒΆ
SYSTEM_AUTOMATION = 'System Automation'ΒΆ
VERSION_CONTROL = 'Version Control'ΒΆ
WORKFLOW_AUTOMATION = 'Workflow Automation'ΒΆ
class mcp.utils.extract_mcp_github_repos.MCPLanguageΒΆ

Bases: str, enum.Enum

Programming Languages.

CSHARP = 'C#'ΒΆ
C_CPP = 'C/C++'ΒΆ
GO = 'Go'ΒΆ
JAVA = 'Java'ΒΆ
OTHER = 'Other'ΒΆ
PYTHON = 'Python'ΒΆ
RUST = 'Rust'ΒΆ
TYPESCRIPT_JAVASCRIPT = 'TypeScript/JavaScript'ΒΆ
class mcp.utils.extract_mcp_github_repos.MCPPlatformΒΆ

Bases: str, enum.Enum

Supported Platforms.

CROSS_PLATFORM = 'Cross-Platform'ΒΆ
LINUX = 'Linux'ΒΆ
MACOS = 'macOS'ΒΆ
WINDOWS = 'Windows'ΒΆ
class mcp.utils.extract_mcp_github_repos.MCPRepositoryExtractor(output_dir: str = 'agent_resources/mcp_servers')ΒΆ

Enhanced MCP Repository Extractor.

async extract_all() list[MCPServerDocument]ΒΆ

Main extraction method.

async extract_repositories_from_readme() list[MCPServerMetadata]ΒΆ

Extract repository information from the awesome-mcp-servers. README.

async fetch_github_metadata(metadata: MCPServerMetadata) NoneΒΆ

Fetch additional metadata from GitHub API.

async fetch_readme_content(metadata: MCPServerMetadata) str | NoneΒΆ

Fetch README content from GitHub.

generate_statistics_report(documents: list[MCPServerDocument]) NoneΒΆ

Generate statistics report.

async process_repository(metadata: MCPServerMetadata) MCPServerDocument | NoneΒΆ

Process a single repository.

save_documents(documents: list[MCPServerDocument]) NoneΒΆ

Save documents in various formats.

category_mappingsΒΆ
docs_dirΒΆ
language_indicatorsΒΆ
metadata_dirΒΆ
output_dirΒΆ
platform_indicatorsΒΆ
raw_dirΒΆ
scope_indicatorsΒΆ
session = NoneΒΆ
source_url = 'https://github.com/TensorBlock/awesome-mcp-servers'ΒΆ
statsΒΆ
class mcp.utils.extract_mcp_github_repos.MCPScopeΒΆ

Bases: str, enum.Enum

Server Scope.

CLOUD = 'cloud'ΒΆ
EMBEDDED = 'embedded'ΒΆ
LOCAL = 'local'ΒΆ
class mcp.utils.extract_mcp_github_repos.MCPServerDocument(/, **data: Any)ΒΆ

Bases: pydantic.BaseModel

Complete MCP Server Document.

compute_content_hash() strΒΆ

Compute SHA256 hash of README content.

to_langchain_document() langchain_core.documents.DocumentΒΆ

Convert to LangChain Document.

content_hash: str | None = NoneΒΆ
extracted_at: datetime.datetime = NoneΒΆ
metadata: MCPServerMetadata = NoneΒΆ
model_configΒΆ

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

readme_content: str | None = NoneΒΆ
class mcp.utils.extract_mcp_github_repos.MCPServerMetadata(/, **data: Any)ΒΆ

Bases: pydantic.BaseModel

Metadata for an MCP Server.

get_unique_id() strΒΆ

Generate unique ID for this server.

to_langchain_metadata() dict[str, Any]ΒΆ

Convert to LangChain Document metadata format.

classmethod validate_repo_url(v: str) strΒΆ

Validate GitHub repository URL.

api_base_url: str | None = NoneΒΆ
category: MCPCategory = NoneΒΆ
description: str | None = NoneΒΆ
is_official: bool = NoneΒΆ
languages: list[MCPLanguage] = NoneΒΆ
last_updated: datetime.datetime | None = NoneΒΆ
license: str | None = NoneΒΆ
model_configΒΆ

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str = NoneΒΆ
owner: str = NoneΒΆ
platforms: list[MCPPlatform] = NoneΒΆ
readme_url: str | None = NoneΒΆ
repo_name: str = NoneΒΆ
repo_url: str = NoneΒΆ
scopes: list[MCPScope] = NoneΒΆ
stars: int | None = NoneΒΆ
mcp.utils.extract_mcp_github_repos.create_agent_loader(output_dir: str = 'agent_resources/mcp_servers') callableΒΆ

Create a loader function for agents to access MCP documents.

async mcp.utils.extract_mcp_github_repos.main()ΒΆ

Main function.

mcp.utils.extract_mcp_github_repos.consoleΒΆ