RAG Local Memory Server

MCP Server

Semantic text storage and retrieval with Ollama embeddings

Stale(55)

11stars

1views

Updated Sep 3, 2025

About

A lightweight MCP server that stores text passages using vector embeddings from Ollama and ChromaDB, allowing semantic search and retrieval of relevant chunks, including PDF processing and conversational chunking.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The RAG Local MCP server is a lightweight, self‑contained solution for semantic memory management that lets AI assistants like Claude store and retrieve text based on meaning rather than keyword matches. By integrating Ollama for embedding generation with ChromaDB’s vector store, the server transforms arbitrary passages into high‑dimensional vectors that can be queried efficiently for similarity. This enables developers to build conversational agents that “remember” facts, documents, or user‑generated content without relying on external cloud services or expensive APIs.

When a user asks the LLM to memorize a piece of text, the MCP server automatically creates an embedding using the selected Ollama model and writes the vector into ChromaDB. The process is fully conversational: a single natural‑language prompt can trigger the memorization of multiple sentences, entire PDFs (paged in chunks of 20 to avoid memory spikes), or even long documents that are first chunked by the LLM before being stored. The server also supports incremental PDF ingestion, allowing users to start from a specific page or resume after interruption. This flexibility makes it suitable for onboarding large knowledge bases, such as policy documents, training manuals, or research papers.

Retrieval is equally intuitive. A simple question like “What is Singapore?” prompts the server to perform a nearest‑neighbor search in vector space, returning the most relevant stored passages along with a human‑readable relevance score. The LLM can then present the answer, optionally augmenting it with contextual notes or highlighting why a particular passage was chosen. Because embeddings capture semantic nuance, the system can surface information even when exact keywords are absent, improving recall over traditional keyword‑based retrieval.

Key capabilities include:

Semantic Memorization: Store arbitrary text or PDFs as vectors for later retrieval.
Chunking Automation: Let the LLM split long texts into meaningful segments before storage.
Incremental PDF Ingestion: Process large PDFs page by page, with optional start points.
Vector Search Retrieval: Fast similarity queries via ChromaDB.
LLM‑Driven Workflow: All commands are issued through natural language, keeping the interaction seamless.

In practice, this server powers use cases such as personal knowledge bases for developers, context‑aware chatbots that remember user preferences, or internal tooling where data privacy mandates local storage. By keeping the entire stack on‑premise and open‑source, it offers a secure, cost‑effective alternative to commercial RAG services while still delivering the flexibility and responsiveness that modern AI assistants demand.