About
A lightweight MCP server that downloads videos from platforms like YouTube, Bilibili, TikTok, and Twitter, extracts audio, and transcribes it using multiple STT providers (Deepgram, Gladia, Speechmatics, AssemblyAI). It supports asynchronous processing, speaker separation, and automatic provider selection based on API keys.
Capabilities

MCP Video Digest is a dedicated MCP server that turns online video streams into searchable, speaker‑aware transcripts. By integrating with popular transcription providers such as Deepgram, Gladia, Speechmatics and AssemblyAI, the service abstracts away the complexities of downloading media from more than a thousand sites—including YouTube, Bilibili, TikTok and Twitter—extracting the audio track, and converting it into plain text. This end‑to‑end pipeline is invaluable for developers who want to enrich AI assistants with video content understanding without handling the heavy lifting of media processing.
The server exposes a single, well‑defined MCP tool named . When invoked with a video URL, the tool first uses an asynchronous downloader (built on yt‑dlp) to fetch the best available audio stream. It then selects an appropriate speech‑to‑text backend based on configured API keys and a priority list (Deepgram → Gladia → Speechmatics → AssemblyAI). The chosen backend performs the transcription, optionally returning speaker diarization data. All intermediate files are cleaned up automatically, and detailed logs are kept for troubleshooting. This modular architecture allows developers to swap or add new transcription services simply by extending the class.
For AI workflows, the tool can be called from a Claude or other MCP‑enabled assistant to generate concise summaries, extract key phrases, or feed the transcript into downstream NLP pipelines. Because the server handles concurrency internally, multiple requests can be processed in parallel, making it suitable for high‑throughput environments such as content moderation platforms or educational material generators. The ability to return speaker labels is especially useful for meeting transcription, podcast analysis, or any scenario where understanding who said what matters.
Real‑world use cases include:
- Content creators who need quick captions or subtitles for their videos.
- Educators who want searchable lecture transcripts without manual transcription.
- Accessibility services that generate closed captions for compliance.
- Analytics teams extracting sentiment or topic trends from social media videos.
The MCP Video Digest server stands out because it combines a broad download coverage, flexible provider selection, and asynchronous processing—all within the MCP framework. This makes it a plug‑and‑play component that can be integrated into existing AI assistant ecosystems with minimal friction, enabling developers to focus on higher‑level logic rather than media handling logistics.
Related Servers
n8n
Self‑hosted, code‑first workflow automation platform
FastMCP
TypeScript framework for rapid MCP server development
Activepieces
Open-source AI automation platform for building and deploying extensible workflows
MaxKB
Enterprise‑grade AI agent platform with RAG and workflow orchestration.
Filestash
Web‑based file manager for any storage backend
MCP for Beginners
Learn Model Context Protocol with hands‑on examples
Weekly Views
Server Health
Information
Explore More Servers
Wonderland Editor MCP Server
AI‑powered MCP server for Wonderland Engine development
Effect CLI
A unified command‑line interface for multiple MCP servers
WeRead MCP Server
Power your LLMs with WeChat Read data
BrowserStack MCP Server
Run real-device tests with natural language from your IDE
Google Analytics MCP Server
Natural language access to GA4 data for Claude and Cursor
Boot MCP
Starter template for Model Context Protocol servers