MCPSERV.CLUB
R-lz

MCP Video Digest

MCP Server

Extract and transcribe video content from any site

Stale(50)
26stars
1views
Updated 26 days ago

About

A lightweight MCP server that downloads videos from platforms like YouTube, Bilibili, TikTok, and Twitter, extracts audio, and transcribes it using multiple STT providers (Deepgram, Gladia, Speechmatics, AssemblyAI). It supports asynchronous processing, speaker separation, and automatic provider selection based on API keys.

Capabilities

Resources
Access data sources
Tools
Execute functions
Prompts
Pre-built templates
Sampling
AI model interactions

YouTube

MCP Video Digest is a dedicated MCP server that turns online video streams into searchable, speaker‑aware transcripts. By integrating with popular transcription providers such as Deepgram, Gladia, Speechmatics and AssemblyAI, the service abstracts away the complexities of downloading media from more than a thousand sites—including YouTube, Bilibili, TikTok and Twitter—extracting the audio track, and converting it into plain text. This end‑to‑end pipeline is invaluable for developers who want to enrich AI assistants with video content understanding without handling the heavy lifting of media processing.

The server exposes a single, well‑defined MCP tool named . When invoked with a video URL, the tool first uses an asynchronous downloader (built on yt‑dlp) to fetch the best available audio stream. It then selects an appropriate speech‑to‑text backend based on configured API keys and a priority list (Deepgram → Gladia → Speechmatics → AssemblyAI). The chosen backend performs the transcription, optionally returning speaker diarization data. All intermediate files are cleaned up automatically, and detailed logs are kept for troubleshooting. This modular architecture allows developers to swap or add new transcription services simply by extending the class.

For AI workflows, the tool can be called from a Claude or other MCP‑enabled assistant to generate concise summaries, extract key phrases, or feed the transcript into downstream NLP pipelines. Because the server handles concurrency internally, multiple requests can be processed in parallel, making it suitable for high‑throughput environments such as content moderation platforms or educational material generators. The ability to return speaker labels is especially useful for meeting transcription, podcast analysis, or any scenario where understanding who said what matters.

Real‑world use cases include:

  • Content creators who need quick captions or subtitles for their videos.
  • Educators who want searchable lecture transcripts without manual transcription.
  • Accessibility services that generate closed captions for compliance.
  • Analytics teams extracting sentiment or topic trends from social media videos.

The MCP Video Digest server stands out because it combines a broad download coverage, flexible provider selection, and asynchronous processing—all within the MCP framework. This makes it a plug‑and‑play component that can be integrated into existing AI assistant ecosystems with minimal friction, enabling developers to focus on higher‑level logic rather than media handling logistics.