LLM Analysis Assistant

MCP Server

Proxy server that logs and analyzes LLM interactions

Stale(55)

1stars

2views

Updated Aug 22, 2025

About

LLM Analysis Assistant captures request parameters and responses from OpenAI, Ollama, or other LLM APIs, providing real‑time log display, mock data support, and MCP client functionality for debugging and product market fit analysis.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview of the llm‑analysis‑assistant MCP Server

The llm‑analysis‑assistant server is a lightweight, asynchronous proxy that sits between an AI client (such as Claude) and any large‑model inference endpoint—whether it be Ollama, OpenAI, VLLM, or a custom service. By intercepting requests and responses in real time, it records the full set of parameters used to invoke the model and the exact payloads returned. This turns a black‑box interaction into a transparent, analyzable workflow, allowing developers to audit and understand the logic of their client code without modifying the client itself.

What problem does it solve?

Large‑model clients often encapsulate API calls, masking the request/response details and making it hard to debug or optimize usage. The server exposes a Model Context Protocol (MCP) interface that mirrors the OpenAI specification, enabling any MCP‑compliant assistant to tap into this proxy. Developers can therefore:

Inspect every parameter sent (temperature, top‑p, max tokens, etc.) and the corresponding output.
Compare behavior across different backends (Ollama vs. OpenAI) with a single, unified view.
Identify inconsistencies in API support or mis‑configured arguments that lead to unexpected results.

Core capabilities

Feature	Description
MCP client support	Handles stdio, SSE, and streamable HTTP calls natively.
Initialization detection	Automatically probes the target backend to determine its capabilities (e.g., which sampling options it supports).
Interface detection & logging	Detects whether the backend follows Ollama or OpenAI conventions and logs every interaction for later analysis.
Mocking	Can replace real responses with deterministic mock data, useful for testing or when the backend is unavailable.
Asynchronous architecture	Built on Uvicorn/ASGI with full async support, ensuring low latency even under heavy load.
Real‑time log UI	A web interface that refreshes logs live and allows breakpoint pauses for step‑by‑step debugging.
Socket‑based HTTP client	Uses Python sockets to send GET/POST requests and stream responses, giving fine‑grained control over network traffic.
Packaging	The entire Python package can be compiled into a standalone executable, simplifying deployment.

Real‑world use cases

Model debugging – Developers can see exactly why a model returns a particular answer, adjusting parameters on the fly.
Cross‑platform validation – Run the same prompt against Ollama, OpenAI, or VLLM and compare outputs side‑by‑side.
Compliance & auditing – Store a complete audit trail of all model calls for regulatory or security reviews.
Rapid prototyping – Mock responses allow front‑end teams to develop UI components before the backend is ready.
Educational tooling – Instructors can demonstrate how different sampling settings affect output quality in a controlled environment.

Integration with AI workflows

The server presents itself as an MCP endpoint, so any assistant that can issue standard OpenAI‑style requests (e.g., Claude’s ) can point to it instead of the real backend. The assistant then receives a fully compatible response, while the proxy logs everything behind the scenes. Because it supports SSE and streamable HTTP, real‑time streaming responses remain unchanged, preserving the user experience.

Unique advantages

Zero client modification – Existing clients continue to work unchanged; the proxy simply intercepts traffic.
Unified API surface – Regardless of whether you’re using Ollama, OpenAI, or another provider, the MCP interface remains consistent.
Built‑in mocking – No external test harness needed; you can switch between real and fake data with a single flag.
Minimal footprint – Powered by Uvicorn and uv, the server starts quickly and consumes little memory, making it suitable for local dev machines or lightweight cloud instances.

In summary, the llm‑analysis‑assistant MCP server turns opaque model interactions into a transparent, analyzable process. It empowers developers to debug, audit, and experiment with large‑model inference across multiple backends without altering their existing client code.