Voicevox MCP Light

MCP Server

MCP‑compliant Voicevox text‑to‑speech server

Stale(55)

2stars

0views

Updated Jun 29, 2025

About

A lightweight MCP server that converts text to audio using Voicevox Engine and plays the result via JSON‑RPC over stdio, enabling integration with tools like Cursor or Claude.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

Voicevox MCP Light is a lightweight Model Context Protocol (MCP) server that bridges the high‑quality Voicevox speech synthesis engine with AI assistants such as Claude, Cursor, or any MCP‑compatible client. By exposing a JSON‑RPC over stdio interface, the server allows conversational agents to transform natural language text into spoken audio and play it back in real time. This solves the common problem of integrating local or remote TTS engines into AI workflows without requiring custom adapters for each platform.

The server provides three core transformations: text → audio query, audio query → WAV, and WAV playback. When an assistant calls the tool, Voicevox MCP Light sends the text to the Voicevox Engine (either local or remote), receives an audio query, converts it into raw waveform data, and streams the result to a PulseAudio server for immediate playback. The entire process is encapsulated in a single MCP tool, making it trivial to add voice output capabilities to any existing LLM‑driven application.

Key features include:

MCP‑compatible JSON‑RPC: Works out of the box with any MCP client, enabling seamless integration into existing workflows.
Configurable speaker and engine host: Choose from a range of pre‑trained voice models or point to any Voicevox Engine instance, supporting both CPU and GPU Docker images.
Automatic playback: The server handles audio output via PulseAudio, so developers need not write additional code for sound rendering.
Cross‑platform support: While PulseAudio is required on Linux, the server can be run in Docker or as a local process, allowing deployment on Windows and macOS with minimal adjustments.

Typical use cases include:

Conversational agents that need to read responses aloud, enhancing accessibility and user engagement.
Interactive storytelling or game NPCs where dynamic text must be voiced in real time.
Educational tools that convert explanations or lessons into spoken form for auditory learners.
Assistive technologies where an LLM provides information that is then spoken to users with visual impairments.

By providing a ready‑made, protocol‑standard bridge between Voicevox and AI assistants, Voicevox MCP Light removes the friction of custom TTS integration. Developers can focus on crafting intelligent dialogue while relying on a robust, tested pipeline that converts text to high‑fidelity speech with minimal configuration.