MCP Web Extractor

MCP Server

Clean web content extraction for MCP-enabled workflows

Stale(65)

0stars

2views

Updated Apr 6, 2025

About

An MCP server that uses Readability.js to fetch and strip web pages, returning clean text and metadata. Ideal for saving readable articles into Obsidian or other note-taking applications.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The MCP Web Extractor is a lightweight server that brings the power of Readability.js into Model Context Protocol workflows. By exposing a single capability, it turns any public web page into a clean, structured data object that AI assistants can ingest directly. This eliminates the noise of ads, navigation bars, and other non‑essential elements, delivering a distilled article that is ready for summarization, citation, or note creation.

Developers working with AI assistants often need to pull content from the web in a format that is both human‑readable and machine‑friendly. Traditional scraping tools return raw HTML, which requires additional parsing steps. The Web Extractor abstracts this complexity: you supply a URL, and the server returns an object containing , (HTML), (plain text), , and . This uniform structure allows downstream services—such as knowledge‑base builders or content recommendation engines—to consume the data without custom parsing logic.

Key capabilities include:

Ad‑free extraction – The underlying Readability algorithm removes sidebars, pop‑ups, and other distractions, ensuring that the returned text focuses on the author’s intent.
Metadata enrichment – In addition to raw content, the server supplies contextual metadata like the article title and site name, which is invaluable for indexing or linking.
Seamless Obsidian integration – A ready‑made integration script demonstrates how to hook the server into an Obsidian plugin, enabling users to turn a URL into a polished note with a single click.
MCP‑ready – The server follows MCP conventions, exposing the capability at a standard endpoint (). This makes it trivial to chain the extraction step into larger AI workflows, such as summarization or question‑answering pipelines.

Typical use cases include:

Knowledge‑base construction – Automatically pull clean articles into a note system or database for later retrieval by an AI assistant.
Content summarization – Feed extracted text into a summarizer model to generate concise overviews that the assistant can present to users.
Web‑to‑PDF or Markdown conversion – Use the plain text and metadata to generate formatted documents for archiving or sharing.
Research assistance – Quickly pull research papers or news articles into an AI‑augmented workspace, removing the need to manually copy and paste.

By providing a consistent, noise‑free content source, the MCP Web Extractor empowers developers to build richer, more reliable AI experiences that interact with the web without wrestling with HTML intricacies.