Overview
Discover what makes Wayback powerful
Wayback is a self‑hosted web archiving platform written in Go that turns any machine into a full‑featured archival service. It captures, stores, and serves web pages, offering both a command‑line interface for bulk ingestion and a suite of messaging bots (Telegram, Discord, Matrix, IRC, XMPP) for real‑time interaction. The core engine crawls URLs, normalizes resources, and writes a structured snapshot to disk or an object store. A lightweight HTTP API exposes search, playback, and metadata endpoints, while Prometheus metrics allow operators to monitor crawl health and resource usage.
Batch ingestion
Multi‑protocol integration
Privacy‑first hosting
Media support
Overview
Wayback is a self‑hosted web archiving platform written in Go that turns any machine into a full‑featured archival service. It captures, stores, and serves web pages, offering both a command‑line interface for bulk ingestion and a suite of messaging bots (Telegram, Discord, Matrix, IRC, XMPP) for real‑time interaction. The core engine crawls URLs, normalizes resources, and writes a structured snapshot to disk or an object store. A lightweight HTTP API exposes search, playback, and metadata endpoints, while Prometheus metrics allow operators to monitor crawl health and resource usage.
Key Features
- Batch ingestion: Accepts lists of URLs via CLI or API, enabling large‑scale capture jobs that are automatically throttled and retried.
- Multi‑protocol integration: Results can be forwarded to Telegram channels, Mastodon timelines, GitHub Issues, or stored in IPFS and Telegraph.
- Privacy‑first hosting: Supports running as a Tor hidden service or on localhost, with optional HTTPS via self‑signed certificates.
- Media support: Uses FFmpeg to stream and archive video/audio, preserving dynamic content that traditional crawlers miss.
- Metrics & observability: Exposes Prometheus metrics for latency, queue depth, and storage consumption.
Technical Stack
| Layer | Technology |
|---|---|
| Language | Go 1.22+ (static binaries, excellent concurrency support) |
| Web framework | net/http + gorilla/mux for routing; optional Gin in experimental branches |
| Storage | Local filesystem (default), with pluggable backends: S3, MinIO, IPFS via go‑ipfs-api |
| Search | Elasticlunr (in-memory) for full‑text search; optional integration with Elasticsearch or Meilisearch |
| Metrics | Prometheus client_golang, exposing /metrics endpoint |
| Bots & Webhooks | IRC (goirc), Matrix (gomatrix), Telegram Bot API, Discord Go, XMPP (gosrc) |
| CI/CD | GitHub Actions with automated release workflow, goreportcard, codecov integration |
The architecture is deliberately modular: the core crawler runs as a daemon, while each bot or webhook registers itself via a simple plugin interface. Developers can drop in custom handlers by implementing the wayback.Handler interface and adding them to the service’s dispatcher.
Deployment & Infrastructure
Wayback ships as a single statically linked binary, making containerization straightforward. A Dockerfile is included in the repository; you can run it with a minimal docker run -p 8080:8080 wabarc/wayback. For production, the recommended stack is:
- Container orchestration (Docker Compose or Kubernetes) – exposes HTTP, WebSocket, and bot ports.
- Persistent storage – bind mount
/datato a durable volume; optional S3-compatible object store for scalability. - Reverse proxy – Traefik or Nginx to handle TLS termination and route bot webhooks.
- Monitoring – Prometheus scrape config plus Grafana dashboards (pre‑built in the repo).
The single binary and declarative configuration make scaling trivial: add more workers by increasing WAYBACK_WORKERS or spinning up additional replicas behind a load balancer.
Integration & Extensibility
Wayback’s API is RESTful and documented via GoDoc. Key endpoints include:
POST /archive– enqueue URLs for crawling.GET /search?q=– full‑text search over archived titles and content.GET /playback/{id}– retrieve the stored snapshot as a static bundle.
Developers can hook into these endpoints from external services or build custom dashboards. The bot framework exposes a Bot interface; implementing it allows you to create bespoke message parsers, reply formats, or even new protocols. Webhooks can be subscribed to the archive.completed event for downstream processing.
Developer Experience
The project follows Go best practices: comprehensive unit tests (90%+ coverage), a tidy module layout, and an open‑source license (MIT). Documentation is split between the README, a dedicated docs site (https://docs.wabarc.eu.org/), and GoDoc. The community is active on Telegram, Discord, Matrix, and GitHub Discussions, providing quick feedback loops for feature requests or bug reports. The release process is automated, with semantic versioning and changelog generation baked into CI.
Use Cases
- Academic research: Capture snapshots of scholarly websites, conference proceedings, or policy documents for longitudinal studies.
- Digital preservation: Archival institutions can run Wayback on a private network, preserving local intranet pages and ensuring GDPR compliance.
- Compliance & audit: Companies can archive public-facing web assets to prove historical availability during legal investigations.
- Content moderation: Moderators can automatically archive user‑generated URLs before removal, preserving evidence for appeals.
Advantages
- Performance: Go’s goroutine model and static binaries deliver low‑latency crawls even on modest hardware.
- Flexibility: Pluggable storage and bot systems let you tailor the stack to your infrastructure.
- Licensing: MIT license removes any deployment restrictions, making it safe for commercial use.
- Community & tooling: Active maintainers, comprehensive CI, and a mature API ecosystem reduce integration friction.
Open SourceReady to get started?
Join the community and start self-hosting Wayback today
Related Apps in other
Immich
Self‑hosted photo and video manager
Syncthing
Peer‑to‑peer file sync, no central server
Strapi
Open-source headless CMS for modern developers
reveal.js
Create stunning web‑based presentations with HTML, CSS and JavaScript
Stirling-PDF
Local web PDF editor with split, merge, convert and more
MinIO
Fast, S3-compatible object storage for AI and analytics
Weekly Views
Repository Health
Information
Explore More Apps
ERPNext
Open‑source ERP for end‑to‑end business management
Plane
Open‑source project management for teams
Bubo Reader
Minimalist RSS feed aggregator for self‑hosted sites
MediaCMS
Open‑source video & media CMS for self‑hosted portals
Loomio
Collaborative decision-making for groups and organizations
Gaseous Server
Self-hosted other
