MLX Whisper MCP Server

MCP Server

Apple Silicon Whisper transcription on demand

Stale(55)

16stars

1views

Updated 11 days ago

About

A lightweight Model Context Protocol server that uses the MLX Whisper large-v3-turbo model to transcribe audio files, base64 data, and YouTube videos directly on Apple Silicon Macs.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

MLX Whisper MCP Server

The MLX Whisper MCP Server delivers high‑quality, on‑device audio transcription to AI assistants by leveraging the open‑source Whisper model optimized for Apple Silicon. Instead of sending audio data to a remote cloud service, the server runs locally on an M‑series Mac, ensuring low latency, privacy, and no network costs. For developers building conversational agents that need to understand spoken content—whether it’s a podcast, meeting recording, or user‑uploaded clip—this server provides a straightforward bridge between the assistant and Apple’s powerful GPU acceleration.

The core value lies in its native integration with Claude Desktop (and any MCP‑compatible client). By exposing a small set of tools—transcribe a file, transcribe raw audio, download and transcribe YouTube videos—the server lets assistants answer questions like “What was said in this recording?” or “Translate this Spanish audio to English” with minimal overhead. The high‑quality model offers accurate transcription and translation, while the server’s automatic dependency handling via keeps setup friction low.

Key capabilities include:

File‑based transcription: Directly process local audio files, preserving folder structure and generating accompanying text outputs.
Base64 audio handling: Accept raw audio blobs, useful for streaming or in‑app recordings.
YouTube integration: Download videos on demand and transcribe them, eliminating the need for external download tools.
Language control: Force a specific language or translate content to English, giving assistants flexibility in multilingual contexts.
Rich console output: Immediate feedback during development and debugging, helping developers spot issues quickly.

Typical use cases span a wide spectrum:

Customer support – automatically transcribe recorded calls for searchable logs.
Content creation – convert interviews or lectures into text for captions or articles.
Accessibility tools – provide live subtitles in apps that run on macOS.
Research assistants – transcribe conference talks or academic recordings for note‑taking.

Integration is seamless: once the MCP server is registered in Claude Desktop, any prompt that references an audio file or requests a YouTube transcription triggers the corresponding tool. The server returns plain text, which the assistant can then summarize, analyze, or translate further. Because it runs locally on Apple Silicon, developers enjoy fast turnaround times and complete control over the transcription pipeline without relying on external APIs or internet connectivity.