MCPSERV.CLUB
steipete

Peekaboo MCP Server

MCP Server

Fast macOS screenshots and AI-powered GUI automation

Active(80)
645stars
1views
Updated 11 days ago

About

The Peekaboo MCP Server provides lightning‑fast screen captures, AI image analysis, and full GUI automation for macOS. It enables AI assistants to interact with any app using natural language commands and precise UI element detection.

Capabilities

Resources
Access data sources
Tools
Execute functions
Prompts
Pre-built templates
Sampling
AI model interactions

Peekaboo Banner

Overview

Peekaboo is a macOS‑centric Model Context Protocol (MCP) server that transforms raw visual information into actionable AI context. It resolves a long‑standing gap in AI workflows: the ability to see what’s on the screen and then act upon it without leaving the assistant. By exposing a rich set of GUI‑related capabilities—fast screenshots, AI vision analysis, and full‑blown automation—Peekaboo lets developers embed visual intelligence into their AI agents with minimal friction.

What Problem Does Peekaboo Solve?

Traditional AI assistants operate purely on textual input and output, making it difficult to interact with desktop applications that rely on visual cues. Developers often resort to brittle scripting or manual workarounds when an assistant needs to open a file, click a button, or read data from a graph. Peekaboo eliminates this friction by providing an MCP interface that offers instant, reliable access to the screen state and precise control over GUI elements. This enables agents that can, for example, parse a spreadsheet directly from the display or automatically fill out forms in a native app—all within a single conversation.

Core Capabilities and Value

  • Lightning‑fast screenshot capture of windows, screens, or custom regions without disrupting focus.
  • AI‑powered image analysis that supports GPT‑4.1 Vision, Claude, Grok, or local Ollama models, turning pixel data into structured text.
  • Full GUI automation (v3) with click, type, scroll, and drag primitives that work on any macOS application.
  • Natural‑language automation via an embedded AI agent that interprets commands like “Open TextEdit and write a poem.”
  • Smart UI element detection that maps buttons, text fields, links, and menu items to coordinates, enabling zero‑click extraction of menus and shortcuts.
  • Multi‑screen awareness for window placement and display management.
  • Privacy‑first design with optional local inference, keeping visual data on the machine.

These features give developers a single, unified API to observe and control the desktop, dramatically reducing the boilerplate needed for visual AI tasks.

Use Cases & Real‑World Scenarios

  • Automated UI testing: Agents can capture screenshots of test runs, analyze error dialogs with vision models, and automatically click “Retry” or “Close.”
  • Data extraction from legacy apps: A bot can read tables rendered in a proprietary desktop app, convert them to CSV, and pass the data back into a workflow.
  • Assistive technology: Vision‑enabled assistants can describe screen content to visually impaired users or perform actions on their behalf.
  • Developer tooling: IDE extensions like Cursor can leverage Peekaboo to let an assistant navigate the UI of external tools, install plugins, or trigger builds.
  • Remote support: Agents can provide step‑by‑step guidance by capturing the current screen, analyzing it, and instructing users with precise click coordinates.

Integration into AI Workflows

Peekaboo is designed to plug directly into existing MCP‑compatible assistants. An agent can request a screenshot, feed it to an on‑device or cloud vision model, and then issue automated actions—all within the same conversational context. The server’s automatic session resolution ensures that commands always target the most recent window or application, removing the need for manual state tracking. Developers can chain commands into scripts, embed them in prompts, or expose them as custom tools for end‑users.

Unique Advantages

  • Zero‑dependency on external services: All core functionality runs locally, preserving privacy and eliminating latency.
  • Unified API surface: The same MCP endpoints cover both observation (screenshots, UI element lists) and manipulation (clicks, typing), simplifying client code.
  • Performance‑oriented architecture: The native macOS app component delivers a 100× speed boost over pure CLI spawning, making real‑time interaction feasible.
  • Extensible design: The PeekabooCore library can be reused in other projects, and new tools (e.g., additional AI models) can be added without altering the server contract.

In short, Peekaboo equips AI assistants with visual awareness and desktop control, turning passive conversation into active, context‑rich interaction on macOS.