About
A lightweight MCP server that converts text to audio using Voicevox Engine and plays the result via JSON‑RPC over stdio, enabling integration with tools like Cursor or Claude.
Capabilities
Overview
Voicevox MCP Light is a lightweight Model Context Protocol (MCP) server that bridges the high‑quality Voicevox speech synthesis engine with AI assistants such as Claude, Cursor, or any MCP‑compatible client. By exposing a JSON‑RPC over stdio interface, the server allows conversational agents to transform natural language text into spoken audio and play it back in real time. This solves the common problem of integrating local or remote TTS engines into AI workflows without requiring custom adapters for each platform.
The server provides three core transformations: text → audio query, audio query → WAV, and WAV playback. When an assistant calls the tool, Voicevox MCP Light sends the text to the Voicevox Engine (either local or remote), receives an audio query, converts it into raw waveform data, and streams the result to a PulseAudio server for immediate playback. The entire process is encapsulated in a single MCP tool, making it trivial to add voice output capabilities to any existing LLM‑driven application.
Key features include:
- MCP‑compatible JSON‑RPC: Works out of the box with any MCP client, enabling seamless integration into existing workflows.
- Configurable speaker and engine host: Choose from a range of pre‑trained voice models or point to any Voicevox Engine instance, supporting both CPU and GPU Docker images.
- Automatic playback: The server handles audio output via PulseAudio, so developers need not write additional code for sound rendering.
- Cross‑platform support: While PulseAudio is required on Linux, the server can be run in Docker or as a local process, allowing deployment on Windows and macOS with minimal adjustments.
Typical use cases include:
- Conversational agents that need to read responses aloud, enhancing accessibility and user engagement.
- Interactive storytelling or game NPCs where dynamic text must be voiced in real time.
- Educational tools that convert explanations or lessons into spoken form for auditory learners.
- Assistive technologies where an LLM provides information that is then spoken to users with visual impairments.
By providing a ready‑made, protocol‑standard bridge between Voicevox and AI assistants, Voicevox MCP Light removes the friction of custom TTS integration. Developers can focus on crafting intelligent dialogue while relying on a robust, tested pipeline that converts text to high‑fidelity speech with minimal configuration.
Related Servers
MarkItDown MCP Server
Convert documents to Markdown for LLMs quickly and accurately
Context7 MCP
Real‑time, version‑specific code docs for LLMs
Playwright MCP
Browser automation via structured accessibility trees
BlenderMCP
Claude AI meets Blender for instant 3D creation
Pydantic AI
Build GenAI agents with Pydantic validation and observability
Chrome DevTools MCP
AI-powered Chrome automation and debugging
Weekly Views
Server Health
Information
Explore More Servers
ZIN MCP Client
Lightweight CLI & Web UI for MCP server interaction
Notes MCP
Sync Apple Notes with a cross‑platform MCP server
Bluetooth MCP Server
AI‑powered Bluetooth device detection and interaction
MCP Server Semgrep
AI‑powered static analysis with Semgrep via conversational interface
OpenStreetMap MCP Server
Seamless OSM integration via Map Control Protocol
OpenProject MCP Server
Smart project report generation for teams