About
A lightweight MCP server that uses Faster Whisper to provide fast, batch‑enabled speech recognition. It supports multiple model sizes, CUDA acceleration, and outputs in VTT, SRT, or JSON formats.
Capabilities
Fast Whisper MCP Server
The Fast Whisper MCP server turns a powerful speech‑to‑text model into an AI‑ready service that can be called from any Claude or other MCP‑compatible assistant. By exposing Whisper’s transcription logic through a lightweight HTTP interface, developers can add accurate, multilingual audio understanding to chat flows without managing model weights or GPU resources themselves. The server is built on Faster Whisper, which optimizes the original Whisper architecture for speed and memory efficiency, making it suitable for real‑time or batch transcription workloads.
At its core, the server offers three high‑level tools: get_model_info, transcribe, and batch_transcribe. lets clients discover which Whisper variants (tiny, base, medium, large‑v3, etc.) are available and what their performance characteristics are. handles a single audio file, automatically selecting the best model size for the requested language and returning output in one of several common formats—VTT subtitles, SRT captions, or JSON transcripts with timestamps. extends this to folders of audio files, leveraging a dynamic batching strategy that adapts the number of simultaneous inferences to the GPU’s memory capacity, thereby maximizing throughput while preventing out‑of‑memory errors.
The server’s value lies in its seamless integration with AI workflows. A Claude conversation can simply invoke to convert a user‑uploaded voice note into text, or use to process an entire podcast episode split into segments. Because the MCP protocol handles authentication, request routing, and response formatting automatically, developers can focus on higher‑level logic—such as summarizing transcripts, feeding them into downstream language models, or generating subtitles on the fly—without writing custom inference code. The CUDA auto‑detection feature means that, if a compatible GPU is present, the server will accelerate processing; otherwise it falls back to CPU execution, ensuring broad hardware compatibility.
Key advantages include model caching (so the same Whisper instance is reused across requests), automatic batch‑size tuning based on available GPU memory, and optional Voice Activity Detection (VAD) filtering that trims silence for longer recordings. These optimizations translate to lower latency and higher throughput, making the server suitable for both interactive chat scenarios and large‑scale transcription pipelines. Whether you’re building a voice‑enabled customer support bot, generating closed captions for video content, or simply adding speech input to a data‑analysis workflow, the Fast Whisper MCP server provides an out‑of‑the‑box, high‑performance solution that plugs directly into existing AI toolchains.
Related Servers
MindsDB MCP Server
Unified AI-driven data query across all sources
Homebrew Legacy Server
Legacy Homebrew repository split into core formulae and package manager
Daytona
Secure, elastic sandbox infrastructure for AI code execution
SafeLine WAF Server
Secure your web apps with a self‑hosted reverse‑proxy firewall
mediar-ai/screenpipe
MCP Server: mediar-ai/screenpipe
Skyvern
MCP Server: Skyvern
Weekly Views
Server Health
Information
Explore More Servers
Trade It MCP Server
Seamless stock and crypto trading via natural language
Python MSSQL MCP Server
MSSQL access for language models via MCP
Google Search MCP Server
Real‑time, anti‑bot Google search for AI assistants
MCP Demo Server
Demo server for GitHub management with MCP
Agentic AI Projects MCP Server
Empowering real‑world AI agents with modular protocols
Valencia Smart City MCP Server
Real‑time urban data for LLMs