Fast Whisper MCP Server

MCP Server

High‑performance audio transcription with GPU acceleration

Stale(50)

9stars

0views

Updated 21 days ago

About

A lightweight MCP server that uses Faster Whisper to provide fast, batch‑enabled speech recognition. It supports multiple model sizes, CUDA acceleration, and outputs in VTT, SRT, or JSON formats.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Fast Whisper MCP Server

The Fast Whisper MCP server turns a powerful speech‑to‑text model into an AI‑ready service that can be called from any Claude or other MCP‑compatible assistant. By exposing Whisper’s transcription logic through a lightweight HTTP interface, developers can add accurate, multilingual audio understanding to chat flows without managing model weights or GPU resources themselves. The server is built on Faster Whisper, which optimizes the original Whisper architecture for speed and memory efficiency, making it suitable for real‑time or batch transcription workloads.

At its core, the server offers three high‑level tools: get_model_info, transcribe, and batch_transcribe. lets clients discover which Whisper variants (tiny, base, medium, large‑v3, etc.) are available and what their performance characteristics are. handles a single audio file, automatically selecting the best model size for the requested language and returning output in one of several common formats—VTT subtitles, SRT captions, or JSON transcripts with timestamps. extends this to folders of audio files, leveraging a dynamic batching strategy that adapts the number of simultaneous inferences to the GPU’s memory capacity, thereby maximizing throughput while preventing out‑of‑memory errors.

The server’s value lies in its seamless integration with AI workflows. A Claude conversation can simply invoke to convert a user‑uploaded voice note into text, or use to process an entire podcast episode split into segments. Because the MCP protocol handles authentication, request routing, and response formatting automatically, developers can focus on higher‑level logic—such as summarizing transcripts, feeding them into downstream language models, or generating subtitles on the fly—without writing custom inference code. The CUDA auto‑detection feature means that, if a compatible GPU is present, the server will accelerate processing; otherwise it falls back to CPU execution, ensuring broad hardware compatibility.

Key advantages include model caching (so the same Whisper instance is reused across requests), automatic batch‑size tuning based on available GPU memory, and optional Voice Activity Detection (VAD) filtering that trims silence for longer recordings. These optimizations translate to lower latency and higher throughput, making the server suitable for both interactive chat scenarios and large‑scale transcription pipelines. Whether you’re building a voice‑enabled customer support bot, generating closed captions for video content, or simply adding speech input to a data‑analysis workflow, the Fast Whisper MCP server provides an out‑of‑the‑box, high‑performance solution that plugs directly into existing AI toolchains.