Computer Control MCP

MCP Server

Remote desktop automation with mouse, keyboard, and OCR

Active(71)

53stars

2views

Updated 12 days ago

About

A lightweight MCP server that enables programmatic control of a computer’s mouse, keyboard, and screen. It offers screenshot capture, OCR extraction, window management, and drag‑and‑drop actions—all with zero external dependencies.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

MCP Computer Control Demo

The Computer Control MCP bridges the gap between conversational AI and real‑world desktop interaction. By exposing a rich set of tools that mimic the core functions of a human operator—mouse movement, keyboard entry, screen capture, and OCR—the server enables AI assistants to manipulate applications, automate workflows, and extract information directly from the user’s environment. This is particularly valuable for developers building AI‑powered productivity agents, remote support bots, or automated testing suites that must act on a live desktop rather than a simulated environment.

At its core, the server implements a straightforward set of actions that map cleanly onto common GUI operations. Mouse tools allow precise clicks, drags, and button state control; keyboard utilities enable typing arbitrary text or pressing individual keys. Screen tools provide full‑screen or window‑specific screenshots, while the integrated OCR engine (RapidOCR on ONNXRuntime) can pull textual data from those images, returning both the extracted string and its coordinates. Window management commands list open windows and bring a chosen window to the foreground, making it trivial for an AI agent to switch context or target a specific application.

Developers can harness these capabilities in several real‑world scenarios. An AI assistant could navigate a spreadsheet, automatically fill out forms, or pull data from a web dashboard by first taking a screenshot and running OCR to locate the relevant fields. In testing, an agent could simulate user interactions across multiple applications, validate UI states via OCR, and report failures back to a CI pipeline. Remote support bots can guide users through complex setups by controlling the host machine, while ensuring that every action is logged and auditable.

Integration with existing MCP workflows is seamless. The server registers its tools under standard names, allowing any Claude or similar client to discover and invoke them through the usual MCP prompt‑tool interface. Because it relies only on lightweight Python libraries (PyAutoGUI, RapidOCR, ONNXRuntime) and has no external binaries, the server can be deployed in isolated environments or containerized setups without additional dependency headaches. This zero‑dependency stance also reduces attack surface and simplifies compliance checks.

In summary, the Computer Control MCP turns an AI assistant into a fully functional desktop operator. Its blend of mouse, keyboard, screenshot, and OCR tools gives developers the means to automate complex GUI tasks, extract data on‑the‑fly, and build intelligent agents that interact with the real world as seamlessly as they converse.