VoiceMode

MCP Server

Real‑time voice conversations for AI assistants

Active(80)

350stars

1views

Updated 11 days ago

About

VoiceMode delivers natural, low‑latency voice interactions to Claude Code and other MCP clients, supporting local or OpenAI‑compatible STT/TTS services with silence detection and multi‑transport options.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

VoiceMode Demo

VoiceMode is a Model Context Protocol (MCP) server that transforms the way developers interact with AI assistants by adding natural, real‑time voice conversations. Instead of typing prompts into Claude Code or other MCP clients, VoiceMode captures spoken input from a local microphone, transcribes it via any OpenAI‑compatible speech‑to‑text service, and streams the resulting text to the AI. The assistant’s replies are then spoken back through a text‑to‑speech engine, creating a seamless two‑way dialogue that feels more like talking to a human colleague than sending commands over the terminal. This capability is especially valuable for rapid prototyping, debugging, or when a developer’s hands are occupied with other tasks.

The server’s design focuses on low latency and ease of integration. It automatically selects the most efficient transport—whether a local microphone or a LiveKit‑based room—and includes silence detection to stop recording when the user pauses, eliminating unnecessary wait times. VoiceMode can run on any major operating system with Python 3.10+, making it accessible to a wide range of development environments. Developers can choose from local voice models or fall back on an OpenAI API key, ensuring consistent performance even if a particular service is temporarily unavailable.

Key capabilities of VoiceMode include:

Real‑time, low‑latency voice input and output that keeps the conversational flow natural.
Support for multiple transport layers, allowing both simple local mic usage and more complex room‑based collaboration via LiveKit.
Automatic silence detection, which stops recording when the user stops speaking, saving bandwidth and improving responsiveness.
Full MCP integration, so any client that understands the protocol (Claude Code, Claude Chat, or custom tools) can plug in VoiceMode without additional adapters.
Open‑source and extensible: developers can replace the STT/TTS backends with any compatible service, making it adaptable to privacy‑first or low‑cost scenarios.

Typical use cases span from hands‑free code review sessions, where a developer can ask for explanations or refactor suggestions while coding, to collaborative pair‑programming in remote teams that rely on LiveKit rooms. In educational settings, VoiceMode can serve as an interactive tutor, allowing students to speak questions and receive spoken answers in real time. For accessibility, it opens up AI assistance to users who prefer or require voice input over text.

By embedding natural voice interactions directly into the MCP workflow, VoiceMode removes a significant friction point in AI‑powered development. It turns the assistant from a passive text interface into an active conversational partner, thereby accelerating iteration cycles and making AI tools more approachable for a broader developer audience.