Webpage Screenshot MCP Server

MCP Server

Capture web pages in a snap with Puppeteer

Stale(55)

39stars

2views

Updated 12 days ago

About

An MCP server that uses Puppeteer to take full‑page or element screenshots, supports multiple formats, authentication, and base64 output for AI agents to visually verify web applications.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Screen Recording May 27 2025 (2)

The Webpage Screenshot MCP Server fills a critical gap in AI‑driven web development workflows by providing an automated, programmatic way to capture visual snapshots of any webpage. Instead of relying on manual browser interactions or third‑party screenshot services, this server leverages Puppeteer to launch a headless (or visible) browser session, navigate to the target URL, and return an image encoded in Base64. For developers building conversational agents or automated QA pipelines, the ability to retrieve a page’s visual state directly from an AI model means agents can “see” what the user sees, verify UI changes, and even compare visual regressions without leaving the model context.

At its core, the server exposes two primary tools: login-and-wait and screenshot-page. The former opens a real browser window, allowing the user to perform manual authentication and then captures cookies for subsequent requests. This is especially useful when dealing with sites that require multi‑factor authentication or dynamic session handling, ensuring that subsequent screenshots are taken under an authenticated context. The latter tool offers granular control over the capture process—developers can choose full‑page or viewport screenshots, set custom dimensions, select image format (PNG, JPEG, WebP), and fine‑tune load conditions with or delay parameters. The server also supports session persistence, so a single browser instance can be reused across multiple screenshot requests, reducing overhead and speeding up multi‑step workflows.

Key capabilities that make this MCP valuable include:

Full‑page and element screenshots: Capture entire pages or target specific DOM elements via CSS selectors.
Multiple image formats and quality settings: Optimize for bandwidth or fidelity depending on the use case.
Base64 output: Eliminates file I/O, allowing AI agents to embed images directly into responses or logs.
Authentication and session persistence: Maintain logged‑in state across requests, critical for testing dashboards or user portals.
Default browser integration: Optionally use the system’s default browser for a more natural rendering environment.

Real‑world scenarios where this server shines are plentiful. In continuous integration pipelines, an AI assistant can automatically take screenshots of a newly deployed web app and compare them against baseline images to detect visual regressions. During UX research, agents can generate screenshots of landing pages under different user flows and present them in stakeholder reports. For automated web scraping or data extraction, the ability to confirm that a page has loaded correctly before extracting content ensures higher reliability. Additionally, developers building voice‑controlled or chat‑based development tools can let the assistant “look” at a page and describe its layout, improving accessibility for visually impaired users.

By integrating seamlessly with existing MCP‑enabled AI assistants such as Claude or Cursor, the Webpage Screenshot server extends the assistant’s capabilities from purely textual analysis to visual comprehension. This synergy allows developers to craft richer, multimodal interactions—combining code generation, UI testing, and visual feedback—all within a single conversational context.