Local RAG MCP Server

MCP Server

Live web search and context injection for LLMs, no external APIs

Active(73)

83stars

2views

Updated 16 days ago

About

The Local RAG MCP Server performs real‑time web searches, fetches embeddings with MediaPipe Text Embedder, ranks results, extracts markdown from URLs, and returns fresh context to language models—all running locally without external API calls.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

RAG Workflow

mcp‑local‑rag is a lightweight, self‑contained Model Context Protocol (MCP) server that brings real‑time web search capabilities directly into AI assistants without relying on external APIs. By running locally, it eliminates latency and privacy concerns associated with cloud‑based search services, allowing developers to embed up‑to‑date knowledge into LLM responses on their own infrastructure.

When a language model receives a prompt that requires recent or domain‑specific information, it can invoke the tool. The server performs a DuckDuckGo query, retrieves ten top results, and uses Google’s MediaPipe Text Embedder to generate embeddings for each snippet. These embeddings are compared against the original query, and the most relevant results are selected. The server then scrapes the HTML content of those URLs, extracts clean Markdown‑formatted context, and streams it back to the model. The LLM can incorporate this fresh evidence into its final answer, effectively extending its knowledge base on the fly.

Key capabilities include: live web search, embedding‑based relevance ranking, Markdown extraction from arbitrary webpages, and tool‑calling integration that works with any MCP‑compatible client such as Claude Desktop, Cursor, or Goose. Because the entire pipeline runs locally, developers can audit every step, control resource usage, and avoid third‑party data exposure. The server also supports Docker deployment for consistent environments and is audited by MseeP, providing an additional layer of security assurance.

Typical use cases span from customer support bots that need the latest product release notes, to research assistants that pull recent academic papers, or even personal knowledge bases that stay current without manual updates. By integrating into an AI workflow, developers can transform static LLMs into dynamic agents that browse the web, retrieve evidence, and generate context‑aware responses—all while keeping data processing on premises.