Speech.sh TTS MCP Server

MCP Server

Command-line text-to-speech via OpenAI, ready for AI assistants

Stale(50)

4stars

1views

Updated Aug 4, 2025

About

Speech.sh is a lightweight shell utility that converts text to speech using OpenAI's TTS models, offering voice selection, speed control, caching, retries, and built‑in MCP support for seamless integration with AI assistants.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Speech.sh – A Command‑Line TTS Engine with MCP Support

Speech.sh turns plain text into high‑quality audio using OpenAI’s TTS APIs. It is designed for developers who need a lightweight, scriptable solution that can be dropped into CI pipelines, voice‑enabled bots, or local utilities. By exposing the TTS functionality through a Model Context Protocol (MCP) server, Speech.sh lets AI assistants such as Claude request speech generation on demand without exposing the OpenAI key or handling HTTP requests directly.

The core workflow is simple: a user supplies text, optionally selects one of six voice models (onyx, alloy, echo, fable, nova, shimmer), sets a speech speed between 0.25 and 4.0, and chooses the model tier (tts‑1 or tts‑1‑hd). The script constructs a JSON payload, sends it to OpenAI’s endpoint via , and streams the resulting MP3 back. It automatically caches each unique combination of text, voice, speed, and model so repeated requests hit the local cache instead of incurring new API calls—great for batch processing or looping dialogues. A configurable retry mechanism with exponential backoff ensures robustness against transient network failures.

Integration into AI workflows is straightforward. The companion script exposes the TTS endpoint as an MCP server; launching it with starts a lightweight HTTP service that listens for JSON requests and returns the synthesized audio. An AI assistant can then invoke this server using a standard MCP “tool” call, passing the desired parameters and receiving the audio file path or binary data. This removes the need for custom SDKs, keeps API keys out of assistant code, and allows developers to mix Speech.sh with other MCP services (e.g., image generation or text summarization) in a single orchestrated workflow.

Key features that set Speech.sh apart include:

Multi‑voice and speed control: choose from six distinct voices and fine‑tune pacing for natural dialogue or announcements.
Model tier flexibility: switch between the standard and the higher‑resolution without code changes.
Dual player support: playback with ffmpeg or mplayer, automatically selecting the best available tool.
Security‑first design: JSON parsing with , strict parameter handling, and no shell injection risks.
MCP compatibility: plug the TTS capability into any MCP‑aware assistant, enabling seamless voice generation in conversational agents.

Typical use cases include:

Voice‑enabled chatbots: convert assistant replies into spoken responses in real time.
Accessibility tools: read aloud documents or notifications for visually impaired users.
Automated announcements: generate dynamic audio messages for IoT devices or smart home systems.
Content creation pipelines: batch‑convert scripts, news articles, or e‑books into audio formats for podcasts.

By combining a minimal command‑line interface with robust caching, retry logic, and MCP integration, Speech.sh offers developers a dependable, secure, and versatile TTS solution that fits neatly into modern AI‑driven workflows.