MCPSERV.CLUB
KunihiroS

Kv Extractor MCP Server

MCP Server

Unstructured text to type-safe key-value pairs in seconds

Stale(55)
1stars
2views
Updated Jul 16, 2025

About

This MCP server automatically discovers and extracts key-value pairs from noisy or unstructured text using LLMs, spaCy NER for multiple languages, and pydantic validation. It outputs structured data in JSON, YAML, or TOML with guaranteed type safety.

Capabilities

Resources
Access data sources
Tools
Execute functions
Prompts
Pre-built templates
Sampling
AI model interactions

Overview

The Kv Extractor MCP Server is a specialized tool that turns unstructured or noisy text into clean, type‑safe key‑value data. It leverages a large language model (GPT‑4.1‑mini) together with pydantic-ai to identify, annotate, and validate the extracted fields. By automatically discovering keys rather than requiring them in advance, it excels at parsing documents where the structure is unknown or highly variable—such as customer emails, scraped web pages, or multilingual logs.

This server offers three output formats—JSON, YAML, and TOML—ensuring compatibility with a wide range of downstream systems. Each format is produced through the same rigorous pipeline: spaCy‑based multilingual NER pre‑processing, LLM‑driven type annotation, iterative refinement of types and values, and final validation against a Pydantic schema. The result is always a well‑formed document that can be consumed by data pipelines, configuration managers, or other AI services without additional parsing logic.

Key capabilities include:

  • Automatic key discovery: The model scans the text for meaningful phrases and treats them as candidate keys, allowing extraction from arbitrary content.
  • Robustness to noise: The multi‑step pipeline corrects misinterpretations and normalizes values, making it reliable even when the input contains typos or informal language.
  • Multilingual support: SpaCy NER runs in Japanese, English, and both Simplified/Traditional Chinese, providing language‑aware candidate generation that boosts accuracy.
  • Type safety and schema enforcement: Pydantic validation guarantees that every output field conforms to the expected type, reducing downstream errors.
  • Consistent responses: Even when extraction is incomplete, the server returns a valid structure, which is critical for automated workflows that cannot tolerate missing or malformed data.

Typical use cases span from data ingestion—converting free‑text logs into structured metrics—to content moderation, where extracted fields are fed to rule engines, and customer support automation, where emails are parsed into ticket attributes. By integrating this MCP server into an AI assistant’s toolset, developers can offload the heavy lifting of text interpretation to a reliable, type‑safe service, freeing their own logic to focus on higher‑level business rules and user interactions.