MCP Docs Reader

MCP Server

Semantic PDF search for Claude Desktop

Stale(50)

0stars

2views

Updated Apr 25, 2025

About

A lightweight MCP server that loads local PDFs, chunks them into semantic embeddings with SentenceTransformer, builds a FAISS index, and returns relevant passages to Claude for document-based question answering.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

MCP Docs Reader in Action

Overview

The MCP Docs Reader is a lightweight Model Context Protocol server designed to turn any collection of PDF files stored locally into an interactive knowledge base for Claude Desktop. By automatically loading documents from a designated docs/ folder, extracting their text, and building a semantic search index, the server lets Claude answer questions that are grounded in the content of those PDFs. This eliminates the need for manual indexing or custom webhooks, providing a plug‑and‑play solution that integrates directly with Claude’s MCP desktop feature.

At its core, the server solves a common pain point for developers and researchers: querying large volumes of unstructured PDF data. Traditional approaches require converting PDFs to plain text, cleaning the output, and then feeding it into a separate retrieval system. MCP Docs Reader abstracts all of that complexity behind a single, easy‑to‑configure service. When Claude receives a user prompt, the server retrieves the most semantically relevant passages, stitches them into a context‑rich prompt, and returns it to Claude for answer generation. This workflow keeps the assistant’s responses accurate and anchored in the source material, which is especially valuable for compliance, technical documentation, or academic research.

Key features include:

Automatic PDF ingestion – the server watches a local docs/ directory and processes any PDFs found there without manual intervention.
Semantic chunking – extracted text is split into meaningful blocks that preserve context, improving retrieval quality.
Vector embeddings with SentenceTransformer – each chunk is converted into a dense vector, enabling fast semantic similarity searches.
FAISS‑based index – the vectors are stored in a highly efficient, in‑memory search structure that scales to thousands of documents.
Top‑k retrieval – the server returns the most relevant passages for a given query, ensuring Claude receives concise, pertinent information.
Prompt construction – the server automatically builds a prompt that combines retrieved passages with the user’s question, streamlining Claude’s reasoning process.

Developers can leverage MCP Docs Reader in a variety of scenarios. A product team might use it to power an internal FAQ bot that references the latest design spec PDFs. Researchers can query a corpus of academic papers stored locally, obtaining summaries or specific data points without leaving Claude. Technical support teams can quickly locate troubleshooting steps from user manuals. Because the server exposes its capabilities through MCP, it fits seamlessly into existing Claude workflows, allowing developers to add document‑based reasoning without rewriting their assistant logic.

What sets MCP Docs Reader apart is its minimal footprint and zero‑code setup. It requires only a local PDF folder, a single configuration tweak in Claude Desktop, and the ability to run a Python environment. Once running, the service is invisible to the user: Claude simply receives enriched prompts, while the server handles all heavy lifting behind the scenes. This combination of simplicity, performance, and deep integration makes MCP Docs Reader an ideal tool for any developer looking to unlock the knowledge hidden in their PDF archives.