PDF Extraction MCP Server

MCP Server

Extract PDF content with OCR support for Claude Code

Stale(55)

20stars

1views

Updated 15 days ago

About

A lightweight MCP server that extracts text and performs OCR on PDF files, supporting page ranges and negative indexing. It integrates seamlessly with Claude Code CLI for quick content retrieval.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

PDF Extraction MCP Server (Claude Code Fork)

The PDF Extraction MCP Server bridges the gap between AI assistants and static PDF documents. By exposing a single, well‑defined tool——the server allows Claude to retrieve text, tables, and other content from PDFs located on the local filesystem. This solves a common pain point for developers who need to feed document data into language‑model pipelines without building custom parsing logic or hosting heavyweight OCR services.

At its core, the server accepts a file path and an optional page specification. The page argument supports comma‑separated ranges, individual numbers, and negative indexing (e.g., for the last page). Internally it leverages a mix of PDF‑parsing libraries (, ) and optional OCR via for scanned images. The result is a plain‑text payload that Claude can immediately consume, annotate, or transform. Because the tool is exposed through MCP, developers can invoke it with a simple prompt like “Extract pages 1‑3 and the last page from ” without leaving their conversational workflow.

Key capabilities include:

Local file access: No need to upload PDFs to cloud storage; the tool reads directly from disk, preserving privacy and reducing latency.
Flexible page selection: Supports ranges, individual pages, or the entire document, giving fine‑grained control over extraction.
OCR fallback: Automatically switches to OCR for scanned or image‑based PDFs, ensuring that text can be retrieved from virtually any PDF format.
CLI integration: Designed to work seamlessly with the Claude Code command‑line interface, allowing developers to add and manage the server via .

Typical use cases span several domains. In research, a scientist can ask Claude to pull specific sections from technical reports for summarization or citation extraction. Legal teams can retrieve relevant clauses from contracts, while finance professionals might extract tables from quarterly earnings PDFs for automated reporting. Because the server runs locally and is invoked through MCP, it integrates naturally into existing Claude workflows—whether in a terminal session or within a larger automation pipeline that chains multiple MCP tools together.

What sets this fork apart is its focus on reliability with Claude Code. The inclusion of turns the package into a runnable module, and the detailed installation guidance ensures that developers can add the server to their Claude environment without friction. This combination of robust PDF handling, ease of deployment, and tight integration with the MCP ecosystem makes the PDF Extraction Server a valuable asset for any developer looking to enrich AI conversations with document content.