Overview
Discover what makes txtdot powerful
txtdot is a lightweight, self‑hosted HTTP proxy that performs **server‑side page simplification**. By leveraging Mozilla’s Readability.js, it extracts only the essential content—text, links, images, and tables—from a target page, strips away ads, trackers, and heavy scripts, and returns a minimalistic HTML document. The result is a significantly reduced payload that speeds up rendering on slow or limited‑bandwidth connections while preserving the original page’s structure and media. The application is built in **Node.js** with a focus on asynchronous I/O, making it highly performant for concurrent proxy requests.
Server‑side simplification
Media proxy & compression
Client‑side rendering support
Integrated search
Overview
txtdot is a lightweight, self‑hosted HTTP proxy that performs server‑side page simplification. By leveraging Mozilla’s Readability.js, it extracts only the essential content—text, links, images, and tables—from a target page, strips away ads, trackers, and heavy scripts, and returns a minimalistic HTML document. The result is a significantly reduced payload that speeds up rendering on slow or limited‑bandwidth connections while preserving the original page’s structure and media. The application is built in Node.js with a focus on asynchronous I/O, making it highly performant for concurrent proxy requests.
Key Features
- Server‑side simplification: All parsing happens on the backend, so clients receive a clean HTML page without needing JavaScript.
- Media proxy & compression: Images are fetched, compressed with the Sharp library, and served from txtdot’s own cache, reducing bandwidth usage.
- Client‑side rendering support: The proxy can render SPA frameworks (React, Vue, etc.) by using the embedded webder engine, which injects a lightweight runtime into the simplified page.
- Integrated search: A built‑in SearXNG instance allows users to perform metasearch directly through the proxy.
- Extensible plugin system: Developers can add custom behavior via plugins defined in
@txtdot/sdkand published under@txtdot/plugins.
Technical Stack
| Layer | Technology |
|---|---|
| Runtime | Node.js (v18+) with Express‑style routing |
| Parsing | Mozilla Readability.js (JavaScript) |
| Image processing | Sharp (Node bindings to libvips) |
| Search engine | SearXNG (Python, Docker container) |
| Containerization | Docker Compose orchestrates the proxy and SearXNG services |
| Database / Persistence | None; stateless HTTP + optional in‑memory cache for images |
The core code is written in TypeScript, compiled to JavaScript for production. The plugin SDK exposes lifecycle hooks (e.g., onRequest, onResponse) and a simple API for manipulating the parsed DOM before it is sent to the client.
Core Capabilities & APIs
- /api/parse – Returns a JSON object containing the parsed content, links, and metadata. Ideal for building custom front‑ends or integrating with other services.
- /api/raw-html – Provides the raw HTML of the simplified page, useful for caching or further processing.
- /get – Browser‑friendly endpoint that accepts a
urlquery parameter and optionalengineorformatflags. - Webhooks – Plugins can register webhook handlers to react on specific events (e.g., cache miss, image fetch failure).
- Rate limiting – Built‑in 2 requests/second limit per IP, configurable via environment variables.
Deployment & Infrastructure
txtdot is intentionally lightweight and can run on any machine that supports Node.js:
- Docker: A single
docker-compose.ymlbrings up the proxy and SearXNG in a few seconds. The image is built from source, so developers can inspect the Dockerfile for customizations. - Scalability: Because txtdot is stateless, horizontal scaling is trivial—deploy multiple instances behind a load balancer or use Kubernetes with an autoscaler.
- Persistence: Image caching is in-memory by default; for production, a Redis or local filesystem cache can be wired via environment variables.
Integration & Extensibility
- Plugin architecture: The
@txtdot/sdkpackage defines a simple interface for plugins. Developers can write TypeScript modules that hook into request/response pipelines, add new parsing engines, or modify the output HTML. - Custom engines: While Readability.js is the default parser, plugins can register alternative parsing libraries (e.g.,
turndown,cheerio) to handle edge cases. - Webder integration: For sites that rely heavily on client‑side rendering, the embedded webder engine injects a minimal runtime to hydrate the page after simplification.
- API consumption: The JSON endpoints are fully documented (see
docs/), enabling developers to build custom dashboards, mobile clients, or integrate with other content‑crawling tools.
Developer Experience
- Configuration: Most settings are exposed as environment variables (e.g.,
PORT,CACHE_TTL,RATE_LIMIT). The Docker Compose file includes sensible defaults. - Documentation: A comprehensive website (
https://tempoworks.github.io/documentation) covers API usage, plugin development, and deployment guides. - Community: The project is open source under MIT license. Issue trackers are active, and the
@txtdot/pluginsrepository hosts community‑contributed extensions. - Testing: The repo includes performance tests against real sites (Habr, Medium) to demonstrate bandwidth savings and load times.
Use Cases
| Scenario | Why txtdot? |
|---|---|
| Low‑bandwidth browsing | Reduces page size by ~70–90 %, ideal for mobile or satellite connections. |
| Ad‑free reading | Strips out ads and trackers, providing a clean reading experience. |
| Content aggregation | APIs allow automated ingestion of cleaned text |
Open SourceReady to get started?
Join the community and start self-hosting txtdot today
Related Apps in other
Immich
Self‑hosted photo and video manager
Syncthing
Peer‑to‑peer file sync, no central server
Strapi
Open-source headless CMS for modern developers
reveal.js
Create stunning web‑based presentations with HTML, CSS and JavaScript
Stirling-PDF
Local web PDF editor with split, merge, convert and more
MinIO
Fast, S3-compatible object storage for AI and analytics
Weekly Views
Repository Health
Information
Explore More Apps
Screego
Low‑latency screen sharing for developers
Yaade
Self-hosted API dev environment for teams
Mafl
Customizable, privacy‑first homepage organizer
La Suite Docs
Collaborative online text editor for teams
reader
Self-hosted other
Task Keeper
Powerful list editor for power users