Minima

MCP Server

Local RAG on Docker with ChatGPT, Claude, or fully offline

Stale(60)

1.0kstars

4views

Updated 12 days ago

About

Minima is an open‑source RAG server that runs in Docker containers, enabling local or on‑premises retrieval‑augmented generation. It supports fully offline mode, custom GPT integration, or Anthropic Claude for querying local documents securely.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Minima is a versatile, open‑source Retrieval‑Augmented Generation (RAG) server that can run entirely on‑premises or integrate seamlessly with popular AI assistants such as ChatGPT and Anthropic Claude. Its core mission is to give developers a secure, self‑hosted solution for indexing and querying private documents while still leveraging the power of large language models (LLMs). By keeping all neural networks—embedding, reranker, and the primary LLM—within a local environment, Minima eliminates data‑leakage risks that accompany cloud‑based RAG services. At the same time, it offers flexible deployment modes so teams can choose between a fully isolated stack or hybrid setups that offload heavy LLM inference to cloud services.

The server exposes a rich set of MCP endpoints, allowing AI assistants to perform document search, context retrieval, and answer generation on the fly. Developers can point Minima at any folder containing PDFs, Word files, Markdown, CSVs, and more; the indexer recursively scans these files, creates embeddings with a Sentence‑Transformer model, stores vectors in Qdrant, and optionally reranks results using a BAAI reranker. When integrated with ChatGPT or Claude, the assistant can query this index in real time, returning answers that are grounded in the user’s own data rather than generic knowledge bases. For purely local use, Minima can even launch an Electron UI that lets users interact with the index directly.

Key capabilities include:

On‑premises isolation – run all components in Docker containers or locally, ensuring data never leaves the network.
Hybrid LLM support – choose between a local Ollama model or a cloud‑hosted LLM (ChatGPT, Claude) while keeping the retrieval stack local.
MCP integration – expose custom endpoints that any MCP‑compliant client can consume, enabling seamless tool calls from assistants.
Scalable vector storage – Qdrant handles high‑dimensional embeddings, supporting efficient similarity search across large corpora.
Easy configuration – a single file controls paths, models, and authentication details for ChatGPT custom GPTs.

Real‑world scenarios benefit from Minima’s design: a legal firm can index confidential case files and query them through ChatGPT without exposing sensitive data; an R&D team can search internal research papers with Claude, while keeping the entire stack behind a corporate firewall; or an individual developer can experiment locally with their own notes and codebases. In each case, Minima turns a static document collection into an interactive knowledge base that AI assistants can harness instantly.