FS-MCP

MCP Server

Intelligent file reading and semantic search for any document

Stale(55)

5stars

1views

Updated Jun 7, 2025

About

FS-MCP is a Model Context Protocol server that automatically detects, reads, and converts files from multiple formats into searchable text. It builds AI embeddings for semantic vector search while enforcing directory security and efficient range reading.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

FS‑MCP: Universal File Reader & Intelligent Search MCP Server

FS‑MCP is a lightweight, Python‑based Model Context Protocol server that turns any file system into an AI‑ready knowledge base. By exposing a set of MCP tools, the server allows Claude or other LLMs to read files, convert documents into plain text or Markdown, and perform semantic searches across large collections—all while keeping access strictly confined to a user‑defined safe directory. The result is a secure, high‑performance gateway that lets developers build AI workflows around existing documents without writing custom parsers or embedding pipelines.

Problem Solved

Many organizations need to integrate unstructured documents—Word reports, PDFs, spreadsheets, and raw text files—into conversational AI. Traditional approaches require manual extraction of text, custom ingestion scripts, and separate vector‑search services, which quickly become maintenance burdens. FS‑MCP consolidates these steps into a single MCP server: it automatically detects text content, handles multiple formats, converts documents to Markdown for easy consumption, and builds an embedding index on the fly. This removes the need for separate ETL pipelines and allows developers to focus on higher‑level AI logic.

Core Capabilities

Intelligent File Detection: The server inspects file contents to determine if they contain readable text, bypassing reliance on extensions.
Multi‑Format Support: From plain to complex Office files and PDFs, FS‑MCP extracts text using industry libraries (python‑docx, openpyxl, PyPDF2).
Range Reading: For large files, clients can request specific line ranges, reducing data transfer and processing time.
Document Conversion & Caching: Documents are automatically converted to Markdown, with results cached so repeated reads are instantaneous.
Semantic Vector Search: Using embeddings (default BAAI/bge‑m3), the server builds an in‑memory vector index that supports similarity queries across all ingested files.
Security Controls: A variable limits file access to a single directory tree, and a configurable maximum file size protects against resource exhaustion.
High Performance: Batch processing, intelligent caching, and optional external vector databases enable the server to handle thousands of documents with low latency.

Real‑World Use Cases

Enterprise Knowledge Management: Deploy FS‑MCP behind a corporate intranet to let employees query policy documents, technical manuals, and internal reports via an LLM.
Legal Document Search: Law firms can index case files, contracts, and precedent documents, enabling lawyers to ask contextual questions and retrieve relevant passages quickly.
Academic Research: Researchers can ingest research papers, theses, and datasets to allow assistants to summarize findings or locate specific experimental details.
Customer Support: Technical support teams can load product manuals and troubleshooting guides, giving agents instant access to the most relevant information when answering tickets.

Integration with AI Workflows

Developers embed FS‑MCP into their LLM pipelines by adding its MCP tools to the agent’s tool set. A typical flow involves:

File Retrieval: The assistant calls the tool, optionally specifying a line range.
Content Processing: The server returns plain text or Markdown, ready for the LLM to analyze.
Semantic Search: The assistant invokes with a query; FS‑MCP returns the most relevant file snippets, which the LLM can then summarize or quote.
Iterative Refinement: The agent may request additional ranges or re‑search with updated prompts, all within the same conversation context.

Because FS‑MCP speaks MCP natively, there is no need for custom adapters—developers can leverage existing FastMCP tooling and focus on crafting prompts that exploit the server’s capabilities.

Unique Advantages

Zero‑Configuration Parsing: No need to write format‑specific parsers; the server auto‑detects and extracts text from all common document types.
Security by Design: The safe‑directory restriction and file‑size limits provide a robust defense against accidental or malicious data exposure.
Seamless Embedding Integration: By wrapping LangChain’s vector store, FS‑MCP offers powerful semantic search without exposing the underlying complexity.
Extensibility: Developers can extend the server with custom converters or integrate external vector databases (e.g., Pinecone, Weaviate) for larger deployments.

FS‑MCP is therefore an all‑in‑one solution that transforms raw files into AI‑friendly knowledge,