MCPSERV.CLUB
acryldata

DataHub MCP Server

MCP Server

Unified metadata access via Model Context Protocol

Active(71)
58stars
1views
Updated 13 days ago

About

The DataHub MCP Server exposes DataHub’s rich metadata through the Model Context Protocol, enabling search, lineage traversal, and query listing across all entity types. It serves as a bridge for tools to consume DataHub data programmatically.

Capabilities

Resources
Access data sources
Tools
Execute functions
Prompts
Pre-built templates
Sampling
AI model interactions

DataHub MCP Server Demo

The DataHub Model Context Protocol (MCP) server bridges the gap between a powerful metadata catalog and AI assistants. It exposes DataHub’s rich entity graph—datasets, tables, columns, jobs, and more—to Claude or any MCP‑compliant client. By turning the catalog into a conversational interface, developers can query metadata, trace lineage, and surface context for data-driven questions without writing custom integrations.

At its core, the server provides three high‑level capabilities. First, it supports universal search across all entity types using arbitrary filters; a user can ask for “all tables with more than 10 million rows in the sales namespace” and receive a structured list. Second, it offers entity metadata retrieval: a single call can return the full description, schema, ownership, and configuration for any DataHub entity. Third, it enables lineage traversal—both upstream (sources) and downstream (consumers)—so an assistant can explain how a dataset flows through pipelines, which jobs read or write it, and where its derived tables appear.

These features are especially valuable for data engineers, analysts, and ML scientists who need instant context about their assets. For example, an AI assistant can answer “Which datasets feed into the revenue forecast model?” by walking the lineage graph, or it can list all SQL queries that reference a particular table, helping auditors track usage. The server also lists associated SQL queries for datasets, giving developers a quick audit trail and facilitating reproducibility.

Integrating the DataHub MCP server into an AI workflow is straightforward. A developer configures the assistant to point at the server’s endpoint, and the MCP client automatically exposes a set of tools: , , , and . The assistant can invoke these tools as part of a conversation, and the server returns JSON‑structured responses that the assistant can incorporate into explanations or downstream processing. This tight coupling eliminates manual API calls, reduces latency, and ensures that the assistant always works with up‑to‑date catalog data.

Unique advantages of this implementation include its comprehensive coverage—every entity type in DataHub is searchable—and the ability to apply arbitrary filters directly within queries, giving developers fine‑grained control over results. The server also supports lineage graph traversal natively, a feature that many other metadata tools expose only through complex queries or custom code. For teams already invested in DataHub, the MCP server turns a static catalog into an interactive knowledge base that AI assistants can query in real time, accelerating data discovery, governance, and troubleshooting across the organization.