MCPSERV.CLUB
Webarchive

Webarchive

Self-Hosted

Fast, simple web page archiving for personal use

Stale(55)
166stars
0views
Updated Mar 13, 2025

Overview

Discover what makes Webarchive powerful

Own Webarchive is a lightweight, self‑hosted web archiving service written in **Go 1.19+** that captures live webpages into multiple persistent formats (headers, single‑file HTML, PDF). It exposes a RESTful API for programmatic ingestion and retrieval, while optionally serving a minimal web UI. The project is intentionally small to fit home‑network or personal use cases, yet it exposes enough hooks for developers to extend its functionality.

Multi‑format capture

Configurable via environment variables

REST API

Optional web UI

Overview

Own Webarchive is a lightweight, self‑hosted web archiving service written in Go 1.19+ that captures live webpages into multiple persistent formats (headers, single‑file HTML, PDF). It exposes a RESTful API for programmatic ingestion and retrieval, while optionally serving a minimal web UI. The project is intentionally small to fit home‑network or personal use cases, yet it exposes enough hooks for developers to extend its functionality.

Key Features

  • Multi‑format capture – choose from raw HTTP headers, a single‑file HTML bundle, or PDF snapshots rendered by wkhtmltopdf.
  • Configurable via environment variables – everything from database path to PDF rendering options (LANDSCAPE, GRAYSCALE, DPI) can be tuned without code changes.
  • REST API/api/v1/pages supports POST for ingestion (with query‑string or JSON body) and GET to fetch metadata.
  • Optional web UI – toggled with UI_ENABLED; the UI is theme‑aware (basic), configurable via prefix and address.
  • Container ready – a Docker Compose service is bundled, making it trivial to run in isolated environments.

Technical Stack

LayerTechnology
RuntimeGo (standard library, net/http, JSON handling)
PersistenceBoltDB or similar key‑value store under ./db (path configurable)
PDF RenderingExternal binary wkhtmltopdf invoked via Go’s os/exec
Web UIStatic assets served by the Go HTTP server; minimal templating
DeploymentDocker Compose, one‑click templates for AWS CloudFormation, DigitalOcean, Render

The choice of Go ensures a single binary with static linking, facilitating deployment on any Linux host. BoltDB provides an embedded, file‑based store that scales to millions of records on modest hardware, while wkhtmltopdf gives high‑quality PDF output without the overhead of a headless browser.

Core Capabilities & API

  • Page ingestionPOST /api/v1/pages accepts a JSON body or query parameters specifying the target URL, desired formats, and optional description.
  • Metadata retrievalGET /api/v1/pages/{id} returns a JSON payload containing capture timestamps, format list, and storage paths.
  • Search & filtering – although not exposed yet, the underlying BoltDB keys can be queried programmatically for custom search logic.
  • Extensibility – developers can wrap the API in a reverse proxy, add authentication middleware, or hook into the capture pipeline to inject custom headers or post‑processing steps.

Deployment & Infrastructure

The service requires only a Go runtime and wkhtmltopdf binary on the host. Docker images are pre‑built, and a Compose file is provided for quick spin‑up:

services:
  webarchive:
    image: derfenix/webarchive:latest
    environment:
      - API_ADDRESS=0.0.0.0:5001
      - DB_PATH=/data/db
    volumes:
      - ./data:/data

For production deployments, the database directory should be mounted on persistent storage; the service can run behind TLS termination or a reverse proxy. The lightweight architecture allows horizontal scaling by running multiple instances and load‑balancing API requests, while the embedded store can be replicated via shared storage or a custom backup strategy.

Integration & Extensibility

  • Plugin hooks – while the current codebase does not expose a plugin API, its modular design (separate packages for capture, storage, and HTTP handling) makes it straightforward to inject custom logic.
  • Webhooks – developers can extend the API layer to emit events after a page is archived, enabling downstream CI/CD pipelines or notification services.
  • Custom formatters – adding a new output (e.g., JSON snapshot, HAR file) involves implementing the Capture interface and registering it in the format map.

Developer Experience

Configuration is entirely environment‑driven, eliminating complex config files. The documentation (README) covers usage patterns and Docker deployment; the API is self‑describing via standard HTTP verbs. Community support is modest but growing, with GitHub issues for feature requests and bug reports. The codebase follows idiomatic Go conventions, making it approachable for developers familiar with the language.

Use Cases

  • Personal research archives – automatically snapshot webpages referenced in notes or bookmarks.
  • Home‑network documentation – preserve local web interfaces (routers, NAS) for troubleshooting.
  • Compliance & audit – store public-facing pages in a tamper‑evident format for regulatory review.
  • CI/CD artifact – capture build documentation or changelogs as PDFs during release pipelines.

Advantages Over Alternatives

  • Zero‑dependency binary – no need for a headless browser or external database service.
  • Fast startup and low memory footprint – ideal for Raspberry Pi or other edge devices.
  • Open source, permissive license – no licensing fees for commercial or personal use.
  • Fine‑grained PDF options – control over orientation, DPI, viewport, and media type directly from environment variables.

For developers seeking a simple yet extensible web archiving tool that can run on any Linux host without complex orchestration, Own Webarchive offers a compelling blend of performance, configurability, and developer friendliness.

Open SourceReady to get started?

Join the community and start self-hosting Webarchive today

Weekly Views

Loading...
Support Us
Most Popular

Infrastructure Supporter

$5/month

Keep our servers running and help us maintain the best directory for developers

Repository Health

Loading health data...

Information

Category
other
License
BSD-3-CLAUSE
Stars
166
Technical Specs
Pricing
Open Source
Database
SQLite
Docker
Dockerfile
Supported OS
LinuxDocker
Author
derfenix
derfenix
Last Updated
Mar 13, 2025