Py-MCP Qdrant RAG Server

MCP Server

Semantic search and RAG powered by Qdrant and Ollama or OpenAI

Stale(55)

2stars

2views

Updated Sep 13, 2025

About

This MCP server enables Retrieval‑Augmented Generation by indexing documents into a Qdrant vector database, supporting multiple formats and web scraping. It offers fast semantic search with either local Ollama embeddings or OpenAI models, fully integrated into Claude Desktop.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The py‑mcp‑qdrant‑rag server implements the Model Context Protocol (MCP) to provide a fully fledged Retrieval‑Augmented Generation (RAG) backend powered by the Qdrant vector database. Its primary purpose is to let AI assistants—such as Claude running on the desktop—to query a rich, semantic knowledge base that can be built from arbitrary documents and web resources. By exposing a small set of MCP endpoints, the server translates natural‑language commands into vector searches and returns context‑aware snippets that can be used directly in the assistant’s responses.

What sets this MCP apart is its flexible embedding pipeline. Developers can choose between a local Ollama model (e.g., nomic‑embed‑text) or the OpenAI embeddings API, giving them control over latency, cost, and data privacy. The server automatically handles the conversion of documents into vector embeddings before storing them in Qdrant, ensuring that queries are resolved through fast cosine‑similarity or approximate nearest neighbor searches. This makes it suitable for scenarios where up‑to‑date documentation must be queried in real time, such as internal developer portals or customer support knowledge bases.

Key capabilities include:

Semantic Search – The vector index allows fuzzy, meaning‑based retrieval rather than keyword matching.
Multi‑Format Import – PDFs, Markdown, DOCX, plain text and other common formats are parsed out of the box.
Web Scraping – A single command can pull and index a website or GitHub README, streamlining onboarding of external resources.
Bulk Import – Entire directories can be ingested in one shot, making it trivial to index a large codebase or documentation tree.
Fast Retrieval – Qdrant’s ANN engine guarantees sub‑millisecond lookups even for millions of vectors, keeping assistant latency low.
MCP Integration – The server registers itself in Claude Desktop’s configuration, so the assistant can issue RAG commands through natural language without any custom tooling.

Typical use cases involve:

Developer Assistance – A software engineer can ask the assistant to “search for how to configure authentication in our API” and receive a precise excerpt from internal docs.
Customer Support – A support agent can pull up relevant FAQ entries or troubleshooting steps by querying the knowledge base.
Product Documentation – Product managers can keep the latest feature docs indexed and let the assistant surface them during stakeholder meetings.
Compliance Audits – Legal teams can query policy documents to verify adherence, all through a conversational interface.

Integration into an AI workflow is straightforward: the MCP server runs as a separate process (often in its own Conda environment), while Claude Desktop talks to it via the standard MCP command/argument protocol. The assistant issues a search or add documentation request, the server processes it, and returns JSON containing the best‑matching passages. The assistant can then embed those passages into its response, achieving a true RAG experience without any custom code in the client.