MCPSERV.CLUB
bsmnyk

Gradio Transcript MCP Server

MCP Server

Transcribe audio/video from URLs via Whisper

Stale(55)
0stars
1views
Updated May 4, 2025

About

A Gradio MCP server that downloads media from a given URL, converts it to WAV, and uses OpenAI Whisper to produce English transcriptions. It exposes a single MCP tool, transcribe_url, for easy integration with MCP clients.

Capabilities

Resources
Access data sources
Tools
Execute functions
Prompts
Pre-built templates
Sampling
AI model interactions

Gradio Transcript MCP in Action

Overview

is a ready‑to‑run Gradio application that doubles as an MCP server, offering a single, well‑defined tool: transcribe_url. It solves the common problem of turning arbitrary online audio or video into clean, machine‑readable text. By leveraging OpenAI’s Whisper model and robust media handling utilities such as and , the server downloads content from any public URL, normalizes it to a WAV format, and produces an English transcription in seconds. This eliminates the need for developers to build their own download‑and‑transcribe pipelines, saving time and reducing complexity.

The server’s value lies in its simplicity and portability. It can run locally on a developer’s machine or be hosted as a public Hugging Face Space, making it accessible to both internal teams and external clients. Because the tool is exposed via MCP, any AI assistant that understands the protocol—such as Claude or custom agents—can invoke it directly from a conversation, seamlessly integrating transcription into larger workflows (e.g., summarizing meeting recordings or extracting insights from podcasts).

Key features include:

  • URL‑based input: Accepts any reachable media link, whether it’s a YouTube video, an MP3 stream, or a hosted clip.
  • Automatic format conversion: Uses to download and to convert diverse formats into a single, Whisper‑friendly WAV file.
  • Device flexibility: Detects and utilizes GPU acceleration when available, falling back to CPU otherwise, ensuring optimal performance on a wide range of hardware.
  • Robust error handling: Provides clear, user‑friendly messages if the download fails or if Whisper encounters an issue.
  • SSE‑compatible MCP endpoint: The Gradio app exposes the tool via a Server‑Sent Events URL, making it straightforward to plug into any MCP client that supports streaming responses.

Typical use cases span from content creators who need quick subtitles, to customer support teams transcribing recorded calls, to researchers converting lecture recordings into searchable text. In an AI workflow, a conversational agent could ask the MCP server to transcribe a new video link, then pass the resulting text to downstream summarization or sentiment‑analysis tools—all within a single, coherent dialogue. The combination of Gradio’s intuitive UI and MCP’s extensibility makes a standout solution for any project that requires reliable, on‑demand audio/video transcription.