Zonos TTS MCP for Linux

MCP Server

Linux‑native Claude TTS via Zonos API

Stale(65)

31stars

2views

Updated 13 days ago

About

A Model Context Protocol server that lets Claude generate and play speech on Linux using the Zonos TTS system, supporting multiple languages, emotions, and PulseAudio/PipeWire playback.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The mcp‑tts server brings full‑featured text‑to‑speech (TTS) capabilities to AI assistants that communicate via the Model Context Protocol. By exposing four distinct TTS tools—, , , and —developers can choose the most appropriate voice engine for their workflow, whether it’s a quick prototype with macOS’ built‑in voices or a production system that requires premium AI voices. The server’s design focuses on seamless integration: each tool is registered as a standard MCP resource, so any Claude‑compatible client can invoke speech output with a single function call and receive audio streams in real time.

The server solves the practical problem of delivering audible feedback from AI agents. In IDEs like Cursor or desktop assistants such as Claude Desktop, users often need to hear prompts, error messages, or conversational responses without diverting their eyes from the code editor. mcp‑tts eliminates this friction by turning textual responses into spoken audio, enabling hands‑free interaction and accessibility for users with visual impairments or multitasking needs.

Key features include a sequential speech queue that prevents overlapping audio across multiple agents, ensuring clarity in collaborative environments. Developers can override this behavior with a simple environment variable or flag to allow concurrent playback when desired. Each TTS tool offers fine‑grained control: voice selection, speed adjustment (0.25x–4.0x), and custom voice instructions for OpenAI’s models. The server also supports three quality tiers—, , and —allowing teams to balance latency, bandwidth, and audio fidelity.

Typical use cases span from automated code review assistants that read diffs aloud, to chatbots in customer support that speak responses directly to users’ headphones. In educational settings, mcp‑tts can read documentation or coding tutorials aloud, enhancing inclusivity. By integrating TTS into the MCP ecosystem, developers can build richer multimodal experiences without managing separate speech synthesis services or handling audio pipelines manually.