About
Minima is an open‑source RAG server that runs in Docker containers, enabling local or on‑premises retrieval‑augmented generation. It supports fully offline mode, custom GPT integration, or Anthropic Claude for querying local documents securely.
Capabilities
Minima is a versatile, open‑source Retrieval‑Augmented Generation (RAG) server that can run entirely on‑premises or integrate seamlessly with popular AI assistants such as ChatGPT and Anthropic Claude. Its core mission is to give developers a secure, self‑hosted solution for indexing and querying private documents while still leveraging the power of large language models (LLMs). By keeping all neural networks—embedding, reranker, and the primary LLM—within a local environment, Minima eliminates data‑leakage risks that accompany cloud‑based RAG services. At the same time, it offers flexible deployment modes so teams can choose between a fully isolated stack or hybrid setups that offload heavy LLM inference to cloud services.
The server exposes a rich set of MCP endpoints, allowing AI assistants to perform document search, context retrieval, and answer generation on the fly. Developers can point Minima at any folder containing PDFs, Word files, Markdown, CSVs, and more; the indexer recursively scans these files, creates embeddings with a Sentence‑Transformer model, stores vectors in Qdrant, and optionally reranks results using a BAAI reranker. When integrated with ChatGPT or Claude, the assistant can query this index in real time, returning answers that are grounded in the user’s own data rather than generic knowledge bases. For purely local use, Minima can even launch an Electron UI that lets users interact with the index directly.
Key capabilities include:
- On‑premises isolation – run all components in Docker containers or locally, ensuring data never leaves the network.
- Hybrid LLM support – choose between a local Ollama model or a cloud‑hosted LLM (ChatGPT, Claude) while keeping the retrieval stack local.
- MCP integration – expose custom endpoints that any MCP‑compliant client can consume, enabling seamless tool calls from assistants.
- Scalable vector storage – Qdrant handles high‑dimensional embeddings, supporting efficient similarity search across large corpora.
- Easy configuration – a single file controls paths, models, and authentication details for ChatGPT custom GPTs.
Real‑world scenarios benefit from Minima’s design: a legal firm can index confidential case files and query them through ChatGPT without exposing sensitive data; an R&D team can search internal research papers with Claude, while keeping the entire stack behind a corporate firewall; or an individual developer can experiment locally with their own notes and codebases. In each case, Minima turns a static document collection into an interactive knowledge base that AI assistants can harness instantly.
Related Servers
Netdata
Real‑time infrastructure monitoring for every metric, every second.
Awesome MCP Servers
Curated list of production-ready Model Context Protocol servers
JumpServer
Browser‑based, open‑source privileged access management
OpenTofu
Infrastructure as Code for secure, efficient cloud management
FastAPI-MCP
Expose FastAPI endpoints as MCP tools with built‑in auth
Pipedream MCP Server
Event‑driven integration platform for developers
Weekly Views
Server Health
Information
Explore More Servers
OCI Registry MCP Server
Query OCI registries with LLM-powered tools
Pattern Cognition MCP Server
Analyze conversational patterns to reveal cognitive DNA
1scan MCP Server
Unified blockchain explorer gateway for AI assistants
Mcptool.Sh
Unified CLI for managing, running, and integrating MCP servers effortlessly
SendGrid MCP Server
Email automation via AI assistants
GitHub Support Assistant
Find similar GitHub issues quickly for faster troubleshooting