macOS Screen View & Control MCP Server

MCP Server

Capture macOS window screenshots and control windows via LLMs

Stale(55)

10stars

2views

Updated Aug 13, 2025

About

This MCP server lets large language models capture screenshots of specific macOS windows by title or ID, list and find windows, and send keystrokes or type text for automated UI interactions.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Screenshot of macOS Screen View & Control MCP Server in action

The macOS Screen View & Control MCP Server gives AI assistants a direct bridge to the visual and interactive state of a macOS desktop. By exposing window‑specific screenshot capture, window enumeration, and input simulation tools, the server solves a common bottleneck in AI‑driven automation: the lack of reliable, programmatic access to what is actually displayed on a user’s screen. For developers building conversational agents that need to verify UI states, generate visual reports, or perform end‑to‑end testing, this server turns a series of shell scripts and AppleScript calls into clean, reusable MCP tools.

At its core, the server offers five primitives that map closely to everyday desktop tasks. The tool lets a model request an image of any visible window by title or ID, delivering the result either as raw binary data or a base64 string for easy embedding. and provide discovery capabilities, enabling a model to locate windows before acting on them. The remaining two tools, and , allow the assistant to interact with the active window or a focused element, supporting single key presses, modifier combinations, and typed strings with configurable delays. Together these primitives let an assistant orchestrate complex UI workflows—opening a document, typing content, taking a screenshot for verification—all within a single conversational turn.

Developers can integrate the server into their AI pipelines by adding it to the configuration in Claude or Cursor. Once registered, an assistant can invoke these tools via the MCP API, receiving structured responses that include image data or status confirmations. Because the server runs locally on port 8000, latency is minimal and privacy is preserved—no screen data leaves the machine. The server’s design also makes it straightforward to extend; contributors can add new tools such as window resizing or clipboard access, broadening the assistant’s control surface.

Real‑world scenarios that benefit from this MCP include automated UI testing, where a model can capture screenshots after each interaction to compare against expected layouts. Content creators can use the server to generate annotated screenshots for tutorials or documentation without leaving their conversational interface. Accessibility tools might rely on the server to capture screen states for audit or reporting purposes. In each case, the ability to request a window’s visual snapshot on demand eliminates manual screenshotting and enables reproducible, scriptable workflows.

What sets this server apart is its focus on window granularity and input simulation while maintaining a lightweight, native macOS implementation. By avoiding external dependencies beyond standard Python libraries and Apple’s accessibility APIs, it offers a stable, low‑overhead solution that can run on any recent macOS version. The combination of precise window targeting, flexible output formats, and direct keyboard interaction makes it a powerful addition to any AI assistant that needs to see and act on the macOS desktop.