Search Engine with RAG and MCP

MCP Server

Agentic web search powered by RAG, LangChain, and MCP

Stale(50)

2stars

2views

Updated Aug 20, 2025

About

A Python-based search engine that integrates Exa API, FireCrawl, LangChain, and Retrieval-Augmented Generation to deliver web search results through a standardized MCP server. It supports local Ollama or OpenAI LLMs and offers direct, agentic, or server modes.

Capabilities

Resources

Access data sources

Tools

Execute functions

Prompts

Pre-built templates

Sampling

AI model interactions

Overview

The Search Engine with RAG and MCP server is a unified platform that blends web search, retrieval‑augmented generation (RAG), and the Model Context Protocol (MCP) to deliver a ready‑to‑use, agentic AI service. It solves the common developer pain point of wiring together disparate APIs—search engines, web‑crawlers, vector stores, and language models—into a single, protocol‑compliant tool that can be invoked by any MCP‑capable assistant such as Claude. By exposing its capabilities through a standard MCP server, the solution eliminates custom integrations and allows AI agents to request up‑to‑date information from the web in a structured, reliable way.

What it does and why it matters

At its core, the server performs three interconnected tasks:

Web Search & Retrieval – It queries the Exa search API and uses FireCrawl to fetch full‑text content from the top results, ensuring that agents have access to fresh, real‑world data beyond static knowledge bases.
RAG Processing – Retrieved documents are chunked, embedded, and stored in a FAISS vector store. When an agent asks a question, the server retrieves the most relevant snippets and feeds them to the language model, dramatically improving answer relevance and factual accuracy.
MCP Exposure – All these operations are wrapped in a lightweight MCP server, providing standardized tool invocation endpoints. Developers can spin up the server once and let any MCP‑enabled assistant call its search, retrieve, or RAG functions without writing custom adapters.

This design is invaluable for developers building conversational AI experiences that require up‑to‑date knowledge. Instead of hardcoding search logic or maintaining separate services, the MCP server offers a single contract that AI assistants can rely on for consistent behavior and error handling.

Key Features

Multi‑source search: Combines Exa’s fast, high‑quality results with FireCrawl’s deep content extraction.
Vector‑based RAG: Uses FAISS for low‑latency similarity search, enabling precise retrieval of context.
Dual LLM support: Seamlessly switches between local Ollama models and cloud‑based OpenAI APIs, giving teams flexibility in cost and privacy.
Agentic mode: A LangChain agent orchestrates search, retrieval, and generation steps automatically based on user intent.
Asynchronous architecture: Non‑blocking I/O ensures the server scales to multiple concurrent queries without blocking.
Graceful error handling: Built‑in fallbacks and detailed logs help developers diagnose issues quickly.

Real‑world Use Cases

Customer support bots that need to fetch the latest product documentation or policy changes from the web.
Research assistants that pull recent academic papers, synthesize findings, and answer queries in natural language.
Knowledge‑base builders that continuously crawl a company’s intranet, embed content, and expose it to AI agents for internal use.
Education tools that retrieve up‑to‑date learning resources and generate explanations tailored to a student’s question.

Integration with AI Workflows

Developers can integrate the server into existing MCP‑based pipelines by simply pointing their assistant’s tool registry to the server’s endpoints. The agent can then request a search, receive structured results, and optionally trigger RAG to refine the response—all without modifying the assistant’s core logic. Because MCP standardizes request and response schemas, adding or updating capabilities is as simple as deploying a new version of the server.

Standout Advantages

Protocol‑first design: Eliminates vendor lock‑in and promotes interoperability across different AI assistants.
Modular architecture: Each component (search, RAG, agent) can be swapped or upgraded independently.
Local‑first option: With Ollama, teams can keep data and inference on premises, addressing privacy concerns.
Developer‑friendly tooling: Built with type hints, async patterns, and extensive logging, making debugging and extension straightforward.

In summary, the Search Engine with RAG and MCP server provides a turnkey, protocol‑compliant solution for embedding real‑time web search into AI assistants, dramatically enhancing their usefulness in dynamic, data‑driven contexts.