LLM Chat Server

MCP Server

FastAPI-powered chat interface for LLMs

Stale(60)

0stars

1views

Updated Aug 24, 2025

About

A lightweight FastAPI server that facilitates real-time chat interactions with large language models such as Ollama or OpenAI. It manages sessions, tool approvals, and configurable LLM settings via environment variables or CLI arguments.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The Mcp Http Host is a lightweight, FastAPI‑based server that exposes an LLM chat interface to AI assistants through the Model Context Protocol. It bridges a local or cloud‑based language model—whether an Ollama instance or an OpenAI‑compatible endpoint—with the MCP ecosystem, allowing assistants to send user messages, receive model responses, and invoke external tools in a structured manner. By abstracting the details of model invocation behind standard HTTP endpoints, developers can plug the server into any MCP‑compliant workflow without modifying their assistant code.

At its core, the server handles four key interactions: starting a new chat session, forwarding user messages to the LLM, receiving tool call suggestions from the model, and approving or denying those calls. When a user message arrives, the server forwards it to the configured LLM provider, optionally streams the reply, and returns a JSON payload that includes the assistant’s text, any proposed tool usage (with arguments), and a unique request identifier. The assistant can then decide whether to execute the suggested tool; if it chooses to, it posts an approval request back to the server, which in turn triggers the tool execution pipeline. This two‑step approval flow keeps sensitive operations under explicit user control while still enabling the model to suggest powerful actions.

Key capabilities of the server include:

Provider agnosticism: Switch between Ollama and OpenAI backends simply by setting environment variables or CLI flags.
Dynamic configuration: All settings—model name, temperature, context window size, base URLs—are adjustable at launch time, facilitating experimentation and rapid iteration.
Session management: Each chat session is isolated with its own working directory and state, ensuring that file‑system interactions remain contained.
Tool integration: The server exposes a standard tool approval endpoint, allowing assistants to leverage external commands (e.g., filesystem access, code execution) while maintaining safety.
Streaming support: When enabled, the server streams partial responses from the LLM, improving latency for long outputs.

Typical use cases span rapid prototyping of code assistants, educational tooling where students can interact with a local LLM, and internal devops pipelines that need to run model‑guided scripts on a server. For example, an engineering team can deploy the MCP host locally, configure it to use a high‑capacity coder model, and let their AI assistant automatically fetch files, run tests, or generate documentation—all mediated through the MCP protocol. The server’s clear separation of concerns and minimal configuration overhead make it an attractive choice for developers who want to harness LLMs in a controlled, tool‑aware environment.