About
A Gradio MCP server that downloads media from a given URL, converts it to WAV, and uses OpenAI Whisper to produce English transcriptions. It exposes a single MCP tool, transcribe_url, for easy integration with MCP clients.
Capabilities

Overview
is a ready‑to‑run Gradio application that doubles as an MCP server, offering a single, well‑defined tool: transcribe_url. It solves the common problem of turning arbitrary online audio or video into clean, machine‑readable text. By leveraging OpenAI’s Whisper model and robust media handling utilities such as and , the server downloads content from any public URL, normalizes it to a WAV format, and produces an English transcription in seconds. This eliminates the need for developers to build their own download‑and‑transcribe pipelines, saving time and reducing complexity.
The server’s value lies in its simplicity and portability. It can run locally on a developer’s machine or be hosted as a public Hugging Face Space, making it accessible to both internal teams and external clients. Because the tool is exposed via MCP, any AI assistant that understands the protocol—such as Claude or custom agents—can invoke it directly from a conversation, seamlessly integrating transcription into larger workflows (e.g., summarizing meeting recordings or extracting insights from podcasts).
Key features include:
- URL‑based input: Accepts any reachable media link, whether it’s a YouTube video, an MP3 stream, or a hosted clip.
- Automatic format conversion: Uses to download and to convert diverse formats into a single, Whisper‑friendly WAV file.
- Device flexibility: Detects and utilizes GPU acceleration when available, falling back to CPU otherwise, ensuring optimal performance on a wide range of hardware.
- Robust error handling: Provides clear, user‑friendly messages if the download fails or if Whisper encounters an issue.
- SSE‑compatible MCP endpoint: The Gradio app exposes the tool via a Server‑Sent Events URL, making it straightforward to plug into any MCP client that supports streaming responses.
Typical use cases span from content creators who need quick subtitles, to customer support teams transcribing recorded calls, to researchers converting lecture recordings into searchable text. In an AI workflow, a conversational agent could ask the MCP server to transcribe a new video link, then pass the resulting text to downstream summarization or sentiment‑analysis tools—all within a single, coherent dialogue. The combination of Gradio’s intuitive UI and MCP’s extensibility makes a standout solution for any project that requires reliable, on‑demand audio/video transcription.
Related Servers
n8n
Self‑hosted, code‑first workflow automation platform
FastMCP
TypeScript framework for rapid MCP server development
Activepieces
Open-source AI automation platform for building and deploying extensible workflows
MaxKB
Enterprise‑grade AI agent platform with RAG and workflow orchestration.
Filestash
Web‑based file manager for any storage backend
MCP for Beginners
Learn Model Context Protocol with hands‑on examples
Weekly Views
Server Health
Information
Explore More Servers
Ghost MCP Server
Securely manage Ghost CMS via LLM interfaces
MCP Swagger Server
Enable MCP API calls using Swagger-generated descriptions
Pocketbase MCP Server
List PocketBase collections via Model Context Protocol
Nuxt MCP Server
Enabling Model Context Protocol in Nuxt and Vite apps
Web Browser MCP Server
Enable AI web browsing with fast, selective content extraction
DBT CLI MCP Server
AI‑powered interface for dbt command execution