Overview
Discover what makes Webarchive powerful
Own Webarchive is a lightweight, self‑hosted web archiving service written in **Go 1.19+** that captures live webpages into multiple persistent formats (headers, single‑file HTML, PDF). It exposes a RESTful API for programmatic ingestion and retrieval, while optionally serving a minimal web UI. The project is intentionally small to fit home‑network or personal use cases, yet it exposes enough hooks for developers to extend its functionality.
Multi‑format capture
Configurable via environment variables
REST API
Optional web UI
Overview
Own Webarchive is a lightweight, self‑hosted web archiving service written in Go 1.19+ that captures live webpages into multiple persistent formats (headers, single‑file HTML, PDF). It exposes a RESTful API for programmatic ingestion and retrieval, while optionally serving a minimal web UI. The project is intentionally small to fit home‑network or personal use cases, yet it exposes enough hooks for developers to extend its functionality.
Key Features
- Multi‑format capture – choose from raw HTTP headers, a single‑file HTML bundle, or PDF snapshots rendered by
wkhtmltopdf. - Configurable via environment variables – everything from database path to PDF rendering options (
LANDSCAPE,GRAYSCALE,DPI) can be tuned without code changes. - REST API –
/api/v1/pagessupportsPOSTfor ingestion (with query‑string or JSON body) andGETto fetch metadata. - Optional web UI – toggled with
UI_ENABLED; the UI is theme‑aware (basic), configurable via prefix and address. - Container ready – a Docker Compose service is bundled, making it trivial to run in isolated environments.
Technical Stack
| Layer | Technology |
|---|---|
| Runtime | Go (standard library, net/http, JSON handling) |
| Persistence | BoltDB or similar key‑value store under ./db (path configurable) |
| PDF Rendering | External binary wkhtmltopdf invoked via Go’s os/exec |
| Web UI | Static assets served by the Go HTTP server; minimal templating |
| Deployment | Docker Compose, one‑click templates for AWS CloudFormation, DigitalOcean, Render |
The choice of Go ensures a single binary with static linking, facilitating deployment on any Linux host. BoltDB provides an embedded, file‑based store that scales to millions of records on modest hardware, while wkhtmltopdf gives high‑quality PDF output without the overhead of a headless browser.
Core Capabilities & API
- Page ingestion –
POST /api/v1/pagesaccepts a JSON body or query parameters specifying the target URL, desired formats, and optional description. - Metadata retrieval –
GET /api/v1/pages/{id}returns a JSON payload containing capture timestamps, format list, and storage paths. - Search & filtering – although not exposed yet, the underlying BoltDB keys can be queried programmatically for custom search logic.
- Extensibility – developers can wrap the API in a reverse proxy, add authentication middleware, or hook into the capture pipeline to inject custom headers or post‑processing steps.
Deployment & Infrastructure
The service requires only a Go runtime and wkhtmltopdf binary on the host. Docker images are pre‑built, and a Compose file is provided for quick spin‑up:
services:
webarchive:
image: derfenix/webarchive:latest
environment:
- API_ADDRESS=0.0.0.0:5001
- DB_PATH=/data/db
volumes:
- ./data:/data
For production deployments, the database directory should be mounted on persistent storage; the service can run behind TLS termination or a reverse proxy. The lightweight architecture allows horizontal scaling by running multiple instances and load‑balancing API requests, while the embedded store can be replicated via shared storage or a custom backup strategy.
Integration & Extensibility
- Plugin hooks – while the current codebase does not expose a plugin API, its modular design (separate packages for capture, storage, and HTTP handling) makes it straightforward to inject custom logic.
- Webhooks – developers can extend the API layer to emit events after a page is archived, enabling downstream CI/CD pipelines or notification services.
- Custom formatters – adding a new output (e.g., JSON snapshot, HAR file) involves implementing the
Captureinterface and registering it in the format map.
Developer Experience
Configuration is entirely environment‑driven, eliminating complex config files. The documentation (README) covers usage patterns and Docker deployment; the API is self‑describing via standard HTTP verbs. Community support is modest but growing, with GitHub issues for feature requests and bug reports. The codebase follows idiomatic Go conventions, making it approachable for developers familiar with the language.
Use Cases
- Personal research archives – automatically snapshot webpages referenced in notes or bookmarks.
- Home‑network documentation – preserve local web interfaces (routers, NAS) for troubleshooting.
- Compliance & audit – store public-facing pages in a tamper‑evident format for regulatory review.
- CI/CD artifact – capture build documentation or changelogs as PDFs during release pipelines.
Advantages Over Alternatives
- Zero‑dependency binary – no need for a headless browser or external database service.
- Fast startup and low memory footprint – ideal for Raspberry Pi or other edge devices.
- Open source, permissive license – no licensing fees for commercial or personal use.
- Fine‑grained PDF options – control over orientation, DPI, viewport, and media type directly from environment variables.
For developers seeking a simple yet extensible web archiving tool that can run on any Linux host without complex orchestration, Own Webarchive offers a compelling blend of performance, configurability, and developer friendliness.
Open SourceReady to get started?
Join the community and start self-hosting Webarchive today
Related Apps in other
Immich
Self‑hosted photo and video manager
Syncthing
Peer‑to‑peer file sync, no central server
Strapi
Open-source headless CMS for modern developers
reveal.js
Create stunning web‑based presentations with HTML, CSS and JavaScript
Stirling-PDF
Local web PDF editor with split, merge, convert and more
MinIO
Fast, S3-compatible object storage for AI and analytics
Weekly Views
Repository Health
Information
Explore More Apps
OnionShare
Secure, anonymous file sharing via Tor
CUPS
Open-source, standards‑based printing system for Unix-like OS
phpList
Open Source Email Marketing & Newsletter Management Platform
gobookmarks
Personal landing page with GitHub‑backed bookmarks
Hyrax
Digital repository framework built on Ruby on Rails
Swing Music
Self‑hosted music streaming with a slick browser UI