Mcp Tts Kokoro

MCP Server

Text-to-Speech via Gradio with SSE MCP support

Stale(55)

0stars

1views

Updated Jun 3, 2025

About

A lightweight MCP server that converts English text into speech using the Kokoro TTS engine, featuring gender selection and adjustable voice speed. It runs on Apple M1 silicon and can expose a public URL through Gradio Share.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

MCP‑TTS‑Kokoro Demo

Overview

The MCP‑TTS‑Kokoro server is a lightweight, Gradio‑based text‑to‑speech (TTS) service that exposes the Kokoro TTS engine through the Model Context Protocol. It turns plain English text into high‑quality spoken audio, making it straightforward for AI assistants to produce natural speech responses without requiring external TTS services or complex infrastructure. By leveraging the SSE (Server‑Sent Events) standard for MCP communication, it delivers low‑latency audio streams that can be consumed directly by Claude or other AI clients.

Problem Solved

Many conversational agents need to vocalize their outputs, but integrating commercial TTS APIs often introduces latency, cost, or licensing constraints. The MCP‑TTS‑Kokoro server removes these hurdles by running entirely locally on modest hardware, including Apple M1 silicon. It offers a self‑contained solution that developers can deploy on personal machines, edge devices, or within private cloud environments, ensuring privacy and eliminating external dependencies.

Core Functionality

Text ingestion: Accepts English text strings via MCP requests.
Audio generation: Uses the Kokoro TTS engine to synthesize speech, supporting both male and female voices.
Speed control: Allows voice speed adjustment from 0.5× to 1.5×, giving fine‑grained control over pacing.
Streaming output: Sends audio back to the client as a continuous SSE stream, enabling real‑time playback without waiting for the entire file.
Public access: The embedded Gradio interface can generate a shareable URL, simplifying quick demonstrations or collaborative testing.

Use Cases

Voice‑enabled chatbots: Convert AI responses into spoken dialogue for accessibility or immersive experiences.
Educational tools: Provide auditory explanations of text content, aiding language learners and visually impaired users.
Interactive storytelling: Generate narration for games or virtual tours, with adjustable pacing to match scene dynamics.
Internal testing: Quickly prototype voice outputs during development without incurring external API costs.

Integration with AI Workflows

Developers can embed the MCP‑TTS‑Kokoro server into existing pipelines by calling its endpoint via standard MCP tooling. The SSE stream can be piped directly into audio playback libraries or forwarded to downstream services (e.g., speech‑to‑text for confirmation). Because it follows the MCP specification, any Claude instance or other AI assistant that supports MCP can natively invoke the TTS service as part of a larger chain, such as generating a spoken summary after processing user input.

Distinct Advantages

Hardware friendliness: Designed to run efficiently on Apple M1 chips, making it ideal for developers with limited resources.
Open‑source transparency: Built on the Kokoro engine, which is fully open source and free from commercial licensing.
Simplicity: The Gradio UI lowers the barrier to entry, allowing quick visual validation while still exposing a robust MCP API.
Real‑time streaming: SSE support ensures minimal delay between request and audible output, a critical factor for conversational applications.

In summary, MCP‑TTS‑Kokoro delivers a fast, privacy‑preserving TTS solution that seamlessly plugs into AI assistants via the MCP protocol, empowering developers to add voice capabilities with minimal setup and maximum control.