VoiceMode MCP Server

MCP Server

Real‑time voice conversations for AI assistants

Active(80)

351stars

1views

Updated 11 days ago

About

VoiceMode provides low‑latency, natural‑voice interactions for Claude Code and other MCP clients. It supports local STT/TTS services, silence detection, and multiple transports for seamless voice-enabled AI workflows.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

VoiceMode Demo

VoiceMode is a Model Context Protocol (MCP) server that brings natural, real‑time voice conversations to Claude Code and other MCP‑compatible AI assistants. By acting as a bridge between speech input/output services and the MCP framework, it lets developers treat voice like any other first‑class tool—no custom integrations required. The server’s core goal is to eliminate the friction that normally accompanies adding spoken interaction to an AI workflow, enabling developers to prototype voice‑enabled assistants quickly and deploy them across Windows, macOS, Linux, or WSL environments.

At its heart, VoiceMode captures microphone audio, streams it to a speech‑to‑text (STT) engine—either a local OpenAI‑compatible model or an external API—and feeds the resulting text into the MCP conversation context. The assistant’s response is then rendered through a text‑to‑speech (TTS) service and played back to the user. This round‑trip happens with low latency, thanks to automatic transport selection and a lightweight event loop that pauses only when silence is detected. The server also supports multiple transports, including direct microphone access and LiveKit room‑based communication, so teams can collaborate in shared voice spaces or keep interactions private on a single machine.

Key capabilities include:

Silence detection that automatically ends recording when the user stops speaking, preventing awkward pauses.
Local voice model support, allowing developers to run STT/TTS on the edge without depending on cloud services.
Automatic fallback to an OpenAI API key if local models are unavailable, giving a seamless fail‑over strategy.
MCP integration that exposes VoiceMode as a tool, making it discoverable and usable from any MCP client without additional code.
Cross‑platform compatibility, with pre‑compiled binaries and a single Python package that works on Linux, macOS, Windows (WSL), and NixOS.

Real‑world use cases are plentiful. A product manager can dictate feature requests to a Claude Code assistant, receive spoken code suggestions, and iterate in a single conversation. A customer support bot can answer user queries through voice, improving accessibility for users who prefer spoken interactions. In collaborative environments, VoiceMode can enable a shared LiveKit room where multiple developers discuss architecture decisions while the AI assistant provides on‑the‑fly documentation or code snippets. Because VoiceMode is an MCP server, it can be combined with other tools—such as file editors or data fetchers—to create sophisticated multimodal workflows that blend speech, text, and code.

For developers familiar with MCP, VoiceMode offers a plug‑and‑play solution that removes the usual boilerplate of setting up audio pipelines. By treating voice as just another tool, it aligns with the MCP philosophy of composability and modularity, allowing teams to iterate on voice features without reinventing the wheel. Its combination of low latency, silence detection, and local model support gives it a distinctive edge over generic speech libraries that require manual integration.