MCPSERV.CLUB
Opik

Opik

Self-Hosted

Open-source LLM evaluation platform for tracing, metrics, and guardrails

Active(100)
15.0kstars
0views
Updated 21 hours ago
Opik screenshot 1
1 / 5

Overview

Discover what makes Opik powerful

Opik is an open‑source platform for building, evaluating, and optimizing large language model (LLM) applications. From a developer’s perspective, it acts as both a **tracing engine** and an **evaluation framework**, allowing you to capture every request, response, and internal span of your LLM pipelines. The system exposes a RESTful API, a lightweight Python SDK, and an extensible event bus that can be hooked into any RAG chatbot, code assistant, or multi‑agent workflow. By integrating Opik into your deployment pipeline you can automatically log traces during training, run metrics against a held‑out test set, and feed the results back into a continuous optimization loop.

Tracing & Spans

Evaluation Metrics

Agent Optimizer

Guardrails

Overview

Opik is an open‑source platform for building, evaluating, and optimizing large language model (LLM) applications. From a developer’s perspective, it acts as both a tracing engine and an evaluation framework, allowing you to capture every request, response, and internal span of your LLM pipelines. The system exposes a RESTful API, a lightweight Python SDK, and an extensible event bus that can be hooked into any RAG chatbot, code assistant, or multi‑agent workflow. By integrating Opik into your deployment pipeline you can automatically log traces during training, run metrics against a held‑out test set, and feed the results back into a continuous optimization loop.

Architecture & Technical Stack

Opik’s core is written in Python 3.10+ and built on top of the FastAPI framework, which provides asynchronous request handling and automatic OpenAPI documentation. The backend stores metadata in a PostgreSQL database, while trace payloads are persisted to an Amazon S3 / MinIO object store or a local filesystem, depending on the deployment mode. The service layer is split into three micro‑services:

  1. API Gateway – Exposes /api/v1 endpoints for logging, querying, and evaluation.
  2. Worker Pool – A Celery queue that processes background jobs such as metric computation, agent optimization, and guardrail checks.
  3. Dashboard – A React/Next.js SPA that consumes the API and renders real‑time dashboards, trace tables, and metric visualizations.

Containerization is fully supported via Docker Compose or Helm charts for Kubernetes. The stack leverages Redis as a message broker, Elasticsearch for full‑text search of traces, and optional Prometheus/Grafana exporters for infrastructure monitoring.

Core Capabilities

  • Tracing & Spans – Capture hierarchical spans with context propagation, allowing developers to drill down into each LLM call.
  • Evaluation Metrics – Pre‑bundled metrics (BLEU, ROUGE, F1) and a flexible “metric definition” DSL enable custom scoring of outputs against reference data.
  • Agent Optimizer – Built‑in optimizers (Few‑Shot Bayesian, MIPRO, evolutionary, MetaPrompt) can be triggered via API or scheduled jobs.
  • Guardrails – Plug‑in architecture lets you swap between Opik’s native guardrail models and third‑party libraries (e.g., OpenAI Moderation, Perplexity Guardrails).
  • Webhooks & SDK – The Python SDK (opik) exposes a client.log_trace() method, while webhooks allow external services to react to new traces or metric thresholds.

Deployment & Infrastructure

Opik is designed for self‑hosting on premise or in private clouds. The Docker images are lightweight (~200 MB) and can be run with a single docker compose up command. For production, you’ll typically spin up:

  • A PostgreSQL cluster (replicated for HA).
  • An object storage service (S3/MinIO) for large trace payloads.
  • A Redis instance as the broker and cache layer.
  • Optional Elasticsearch for high‑performance search across millions of traces.

Horizontal scaling is achieved by increasing the number of worker replicas and using a Kubernetes StatefulSet for persistence. The platform’s configuration is declarative (YAML/JSON), making it easy to version-control deployment manifests and roll out updates via CI/CD pipelines.

Integration & Extensibility

Opik’s plugin system is exposed through Python entry points, allowing developers to write custom guardrail or metric plugins that are discovered at runtime. The SDK supports context propagation via OpenTelemetry, so you can integrate Opik with existing observability stacks. Webhooks expose events such as trace_created, metric_aggregated, and optimizer_completed, enabling downstream services (e.g., Slack alerts, CI jobs) to react automatically.

Developer Experience

The SDK is well‑documented with inline type hints and auto‑generated API docs. The platform’s CLI (opikctl) provides commands for schema migrations, health checks, and debugging. Community support is active on Slack and GitHub Discussions, with a dedicated bounty system for feature requests. Licensing under Apache 2.0 gives developers full freedom to modify and redistribute the code without copyleft constraints.

Use Cases

  • RAG Chatbots – Log every vector lookup and LLM response, then compute relevance metrics against a test set.
  • Agentic Workflows – Capture tool‑use spans, evaluate policy compliance via guardrails, and auto‑optimize prompts.
  • Model Benchmarking – Run parallel experiments with different LLMs or prompt variants, aggregate metrics, and compare performance across deployments.
  • Compliance Auditing – Use guardrails to redact PII and log incidents for audit trails.

Advantages

Opik offers low‑latency tracing without the overhead of full observability stacks, a rich set of built‑in optimizers, and an extensible guardrail framework that can be swapped out for any third‑party model. Its open‑source nature and permissive license make it a compelling alternative to proprietary LLM monitoring solutions, especially for teams that require full control over data residency and customization.

Open SourceReady to get started?

Join the community and start self-hosting Opik today

Weekly Views

Loading...
Support Us
Most Popular

Infrastructure Supporter

$5/month

Keep our servers running and help us maintain the best directory for developers

Repository Health

Loading health data...

Information

Category
development-tools
License
APACHE-2.0
Stars
15.0k
Technical Specs
Pricing
Open Source
Database
PostgreSQL
Docker
Official
Supported OS
LinuxDocker
Author
comet-ml
comet-ml
Last Updated
21 hours ago