Langfuse

Self-Hosted

Open-source observability for LLM applications and agents

Active(100)

17.5kstars

0views

Updated 10 hours ago

1 / 2

Overview

Discover what makes Langfuse powerful

Langfuse is an open‑source observability platform tailored for large language model (LLM) applications. At its core, it captures end‑to‑end traces of every LLM call—inputs, outputs, embeddings, and internal metadata—and stores them in a structured database. Developers can then query these traces through a web UI, SDKs (Python, JavaScript/TypeScript), or REST API to debug failures, compare model versions, and run automated evaluations. The platform’s lightweight wrapper around OpenTelemetry makes it trivial to instrument any LLM client; the decorator automatically creates linked spans for every nested call, yielding a complete trace graph without manual instrumentation.

Tracing & Metrics

Prompt & Evaluation Management

Public API & SDKs

Annotations & Playgrounds

Overview

Architecture

Langfuse is built on a modern micro‑service stack:

Layer	Technology
API & UI	Next.js (React) + TypeScript for the web front‑end; FastAPI (Python) exposes a REST API.
Business Logic	Python services orchestrate data ingestion, evaluation pipelines, and metric aggregation.
Data Store	PostgreSQL (relational) for metadata and trace logs; Redis is used for caching and pub/sub during evaluation runs.
Tracing	OpenTelemetry SDKs instrument LLM calls; traces are exported to a local collector and persisted in PostgreSQL.
Containerization	All services are Docker‑ized; a Helm chart and docker‑compose files enable quick deployment on Kubernetes or local Docker.
Observability	Prometheus metrics and Grafana dashboards are bundled for infrastructure monitoring.

The platform is designed to run in a single Docker compose stack for local development, or as a set of Kubernetes deployments for production. Horizontal scaling is achieved by running multiple API replicas behind an ingress, with PostgreSQL configured as a read‑replica cluster for high availability.

Core Capabilities

Tracing & Metrics – Every LLM request is a trace span; developers can drill down to prompt tokens, latency, and cost per token. Aggregated metrics (e.g., average latency per model) are exposed via a Prometheus endpoint.
Prompt & Evaluation Management – A UI for storing prompt templates, tagging them with metadata, and running automated evaluations against multiple models or endpoints.
Public API & SDKs – Drop‑in decorators (@observe) wrap any LLM client (OpenAI, Anthropic, Cohere) and expose the same API across Python, JS/TS, and Go.
Annotations & Playgrounds – Interactive playgrounds allow developers to test prompts in real time; annotations can be attached to traces for debugging context.
Webhooks & Events – External services can subscribe to trace events (e.g., on failure) via configurable webhooks, enabling CI/CD integration or alerting.

Deployment & Infrastructure

Langfuse ships as a set of Docker images on Docker Hub (langfuse/langfuse). A single docker‑compose.yml file spins up the API, UI, PostgreSQL, and Redis. For production, a Helm chart (langfuse-helm) is available; it supports:

Scalable API – Configurable replica count and autoscaling based on CPU/memory.
Stateful PostgreSQL – Uses a StatefulSet with persistent volumes; supports read replicas for high throughput.
Redis Cluster – Optional sharding for large evaluation workloads.
Ingress & TLS – Automatic integration with Ingress controllers and cert‑manager for HTTPS.

The platform is cloud‑agnostic: it runs on bare metal, AWS ECS/EKS, GKE, Azure AKS, or even local laptops.

Integration & Extensibility

Langfuse exposes a public REST API and an OpenTelemetry collector. Developers can:

Build custom adapters for any LLM provider by implementing the LLMAdapter interface in Python or JS.
Extend the evaluation engine with custom scoring metrics (e.g., BLEU, ROUGE) by publishing a plugin to the langfuse-evaluations registry.
Hook into trace data via webhooks for CI pipelines, Slack alerts, or custom dashboards.
Use the SDKs to embed observability directly into micro‑services or serverless functions.

Developer Experience

The platform’s documentation is exhaustive, covering quickstarts, SDK usage, deployment guides, and advanced use cases. The community is active on GitHub Discussions and Discord, ensuring rapid issue resolution. Licensing under MIT gives full freedom to modify or redistribute the codebase.

Key points for developers:

Zero‑config instrumentation – Decorate a function and all nested LLM calls are traced automatically.
Rich UI & API – No need to write custom loggers; everything is available out of the box.
Extensible evaluation – Add new metrics or integrate with external testing frameworks without touching core code.
Self‑hosted control – Full data ownership, GDPR compliance, and the ability to run on private networks.

Use Cases

LLM‑Powered SaaS – A startup building a chatbot platform can instrument every user request, analyze latency per model, and rollback problematic prompts automatically.
Research & Experimentation – Data scientists can run automated benchmarks across multiple models, compare prompt variants, and publish trace dashboards for peer review.
Enterprise Compliance – Companies requiring strict data residency can host Langfuse on-premises, ensuring all prompt logs stay within corporate boundaries.
**CI/CD for