mcp.utils.extract_mcp_github_repos
==================================

.. py:module:: mcp.utils.extract_mcp_github_repos

.. autoapi-nested-parse::

   Enhanced MCP Repository Extractor with README Processing.

   This script:
   1. Extracts repository URLs from awesome-mcp-servers
   2. Downloads and processes README files
   3. Converts to LangChain Documents with metadata
   4. Organizes resources for agent access


Attributes
----------

.. autoapisummary::

   mcp.utils.extract_mcp_github_repos.console


Classes
-------

.. autoapisummary::

   mcp.utils.extract_mcp_github_repos.ExtractionStats
   mcp.utils.extract_mcp_github_repos.MCPCategory
   mcp.utils.extract_mcp_github_repos.MCPLanguage
   mcp.utils.extract_mcp_github_repos.MCPPlatform
   mcp.utils.extract_mcp_github_repos.MCPRepositoryExtractor
   mcp.utils.extract_mcp_github_repos.MCPScope
   mcp.utils.extract_mcp_github_repos.MCPServerDocument
   mcp.utils.extract_mcp_github_repos.MCPServerMetadata


Functions
---------

.. autoapisummary::

   mcp.utils.extract_mcp_github_repos.create_agent_loader
   mcp.utils.extract_mcp_github_repos.main


Module Contents
---------------

.. py:class:: ExtractionStats(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Statistics for extraction process.


   .. py:attribute:: categories
      :type:  dict[str, int]
      :value: None


   .. py:attribute:: extraction_duration
      :type:  float | None
      :value: None


   .. py:attribute:: failed_extractions
      :type:  int
      :value: 0


   .. py:attribute:: languages
      :type:  dict[str, int]
      :value: None


   .. py:attribute:: successfully_extracted
      :type:  int
      :value: 0


   .. py:attribute:: total_found
      :type:  int
      :value: 0


.. py:class:: MCPCategory

   Bases: :py:obj:`str`, :py:obj:`enum.Enum`


   MCP Server Categories.


   .. py:attribute:: AGGREGATORS
      :value: 'Aggregators'


   .. py:attribute:: AI_SERVICES
      :value: 'AI Services'


   .. py:attribute:: ART_LITERATURE
      :value: 'Art & Literature'


   .. py:attribute:: CLOUD_PLATFORMS
      :value: 'Cloud Platforms'


   .. py:attribute:: CLOUD_STORAGE
      :value: 'Cloud Storage'


   .. py:attribute:: COMMUNICATION
      :value: 'Communication'


   .. py:attribute:: DATABASES
      :value: 'Databases'


   .. py:attribute:: DATA_VISUALIZATION
      :value: 'Data Visualization'


   .. py:attribute:: DEVELOPMENT_TOOLS
      :value: 'Development Tools'


   .. py:attribute:: FILE_SYSTEMS
      :value: 'File Systems'


   .. py:attribute:: FINANCE
      :value: 'Finance'


   .. py:attribute:: GAMING
      :value: 'Gaming'


   .. py:attribute:: IDENTITY
      :value: 'Identity'


   .. py:attribute:: IOT
      :value: 'IoT'


   .. py:attribute:: LANGUAGE_TRANSLATION
      :value: 'Language & Translation'


   .. py:attribute:: LOCATION_SERVICES
      :value: 'Location Services'


   .. py:attribute:: MARKETING
      :value: 'Marketing'


   .. py:attribute:: MONITORING
      :value: 'Monitoring'


   .. py:attribute:: NOTE_TAKING
      :value: 'Note Taking'


   .. py:attribute:: OTHER
      :value: 'Other'


   .. py:attribute:: RESEARCH_DATA
      :value: 'Research & Data'


   .. py:attribute:: SANDBOX_VIRTUALIZATION
      :value: 'Sandbox & Virtualization'


   .. py:attribute:: SEARCH_WEB
      :value: 'Search & Web'


   .. py:attribute:: SECURITY
      :value: 'Security'


   .. py:attribute:: SOCIAL_MEDIA
      :value: 'Social Media'


   .. py:attribute:: SYSTEM_AUTOMATION
      :value: 'System Automation'


   .. py:attribute:: VERSION_CONTROL
      :value: 'Version Control'


   .. py:attribute:: WORKFLOW_AUTOMATION
      :value: 'Workflow Automation'


.. py:class:: MCPLanguage

   Bases: :py:obj:`str`, :py:obj:`enum.Enum`


   Programming Languages.


   .. py:attribute:: CSHARP
      :value: 'C#'


   .. py:attribute:: C_CPP
      :value: 'C/C++'


   .. py:attribute:: GO
      :value: 'Go'


   .. py:attribute:: JAVA
      :value: 'Java'


   .. py:attribute:: OTHER
      :value: 'Other'


   .. py:attribute:: PYTHON
      :value: 'Python'


   .. py:attribute:: RUST
      :value: 'Rust'


   .. py:attribute:: TYPESCRIPT_JAVASCRIPT
      :value: 'TypeScript/JavaScript'


.. py:class:: MCPPlatform

   Bases: :py:obj:`str`, :py:obj:`enum.Enum`


   Supported Platforms.


   .. py:attribute:: CROSS_PLATFORM
      :value: 'Cross-Platform'


   .. py:attribute:: LINUX
      :value: 'Linux'


   .. py:attribute:: MACOS
      :value: 'macOS'


   .. py:attribute:: WINDOWS
      :value: 'Windows'


.. py:class:: MCPRepositoryExtractor(output_dir: str = 'agent_resources/mcp_servers')

   Enhanced MCP Repository Extractor.


   .. py:method:: extract_all() -> list[MCPServerDocument]
      :async:


      Main extraction method.


   .. py:method:: extract_repositories_from_readme() -> list[MCPServerMetadata]
      :async:


      Extract repository information from the awesome-mcp-servers.
      README.


   .. py:method:: fetch_github_metadata(metadata: MCPServerMetadata) -> None
      :async:


      Fetch additional metadata from GitHub API.


   .. py:method:: fetch_readme_content(metadata: MCPServerMetadata) -> str | None
      :async:


      Fetch README content from GitHub.


   .. py:method:: generate_statistics_report(documents: list[MCPServerDocument]) -> None

      Generate statistics report.


   .. py:method:: process_repository(metadata: MCPServerMetadata) -> MCPServerDocument | None
      :async:


      Process a single repository.


   .. py:method:: save_documents(documents: list[MCPServerDocument]) -> None

      Save documents in various formats.


   .. py:attribute:: category_mappings


   .. py:attribute:: docs_dir


   .. py:attribute:: language_indicators


   .. py:attribute:: metadata_dir


   .. py:attribute:: output_dir


   .. py:attribute:: platform_indicators


   .. py:attribute:: raw_dir


   .. py:attribute:: scope_indicators


   .. py:attribute:: session
      :value: None


   .. py:attribute:: source_url
      :value: 'https://github.com/TensorBlock/awesome-mcp-servers'


   .. py:attribute:: stats


.. py:class:: MCPScope

   Bases: :py:obj:`str`, :py:obj:`enum.Enum`


   Server Scope.


   .. py:attribute:: CLOUD
      :value: 'cloud'


   .. py:attribute:: EMBEDDED
      :value: 'embedded'


   .. py:attribute:: LOCAL
      :value: 'local'


.. py:class:: MCPServerDocument(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Complete MCP Server Document.


   .. py:method:: compute_content_hash() -> str

      Compute SHA256 hash of README content.


   .. py:method:: to_langchain_document() -> langchain_core.documents.Document

      Convert to LangChain Document.


   .. py:attribute:: content_hash
      :type:  str | None
      :value: None


   .. py:attribute:: extracted_at
      :type:  datetime.datetime
      :value: None


   .. py:attribute:: metadata
      :type:  MCPServerMetadata
      :value: None


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:attribute:: readme_content
      :type:  str | None
      :value: None


.. py:class:: MCPServerMetadata(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Metadata for an MCP Server.


   .. py:method:: get_unique_id() -> str

      Generate unique ID for this server.


   .. py:method:: to_langchain_metadata() -> dict[str, Any]

      Convert to LangChain Document metadata format.


   .. py:method:: validate_repo_url(v: str) -> str
      :classmethod:


      Validate GitHub repository URL.


   .. py:attribute:: api_base_url
      :type:  str | None
      :value: None


   .. py:attribute:: category
      :type:  MCPCategory
      :value: None


   .. py:attribute:: description
      :type:  str | None
      :value: None


   .. py:attribute:: is_official
      :type:  bool
      :value: None


   .. py:attribute:: languages
      :type:  list[MCPLanguage]
      :value: None


   .. py:attribute:: last_updated
      :type:  datetime.datetime | None
      :value: None


   .. py:attribute:: license
      :type:  str | None
      :value: None


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:attribute:: name
      :type:  str
      :value: None


   .. py:attribute:: owner
      :type:  str
      :value: None


   .. py:attribute:: platforms
      :type:  list[MCPPlatform]
      :value: None


   .. py:attribute:: readme_url
      :type:  str | None
      :value: None


   .. py:attribute:: repo_name
      :type:  str
      :value: None


   .. py:attribute:: repo_url
      :type:  str
      :value: None


   .. py:attribute:: scopes
      :type:  list[MCPScope]
      :value: None


   .. py:attribute:: stars
      :type:  int | None
      :value: None


.. py:function:: create_agent_loader(output_dir: str = 'agent_resources/mcp_servers') -> callable

   Create a loader function for agents to access MCP documents.


.. py:function:: main()
   :async:


   Main function.


.. py:data:: console