Mcpdocsearch

MCP Server

Crawl sites, embed docs, query via MCP

Stale(50)

28stars

1views

Updated 17 days ago

About

Mcpdocsearch crawls websites into Markdown, chunks and embeds the content, then serves semantic search tools over MCP for agents like Cursor.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The MCPDocSearch server transforms static website documentation into a searchable knowledge base that can be queried directly by AI assistants such as Claude or Cursor. By crawling a target site, converting its pages to Markdown, and embedding the resulting text into vector space, the server turns any online documentation into a semantic search engine that can be accessed through the Model Context Protocol. This enables developers to let their AI agents answer questions about internal APIs, configuration guides, or product manuals without hard‑coding the knowledge into prompts.

At its core, MCPDocSearch offers two complementary tools. First, a web crawler () walks through the site hierarchy, filters URLs by depth or keyword, cleans extraneous HTML elements, and stitches all pages into a single Markdown file. Second, the MCP server loads these files from the directory, splits them into meaningful chunks based on headings, and generates embeddings with the model. The resulting vectors are cached, so after the initial indexing the server can start almost instantly even for large documentation sets. The cache is automatically refreshed whenever any Markdown file changes, ensuring that updates to the source site are reflected in search results without manual intervention.

The server exposes three intuitive MCP tools for clients:

– returns a list of all crawled Markdown files, allowing an assistant to discover which documents are available.
– provides the hierarchical structure of a selected document, useful for navigation or summarization.
– performs a semantic search over the embedded chunks, returning the most relevant passages along with their source context.

These tools make it straightforward for an AI workflow to retrieve precise answers from internal docs, suggest next steps in a support conversation, or generate documentation summaries on demand. Because the server runs via transport, it can be launched directly within Cursor or any other MCP‑compatible client, eliminating the need for a separate REST API.

In practice, teams can use MCPDocSearch to keep their AI assistants up‑to‑date with the latest product documentation, support knowledge bases, or developer guides. It is especially valuable in environments where docs are frequently updated and developers want to avoid manual prompt engineering—an assistant can simply query the server for “how do I configure the authentication module?” and receive a contextual excerpt from the latest docs. The combination of automated crawling, semantic embedding, and MCP tooling provides a robust, low‑maintenance solution that scales from small internal wikis to large public documentation portals.