MCPSERV.CLUB
txtdot

txtdot

Self-Hosted

Text‑only web proxy for faster, ad‑free browsing

Stale(63)
192stars
0views
Updated May 4, 2025

Overview

Discover what makes txtdot powerful

txtdot is a lightweight, self‑hosted HTTP proxy that performs **server‑side page simplification**. By leveraging Mozilla’s Readability.js, it extracts only the essential content—text, links, images, and tables—from a target page, strips away ads, trackers, and heavy scripts, and returns a minimalistic HTML document. The result is a significantly reduced payload that speeds up rendering on slow or limited‑bandwidth connections while preserving the original page’s structure and media. The application is built in **Node.js** with a focus on asynchronous I/O, making it highly performant for concurrent proxy requests.

Server‑side simplification

Media proxy & compression

Client‑side rendering support

Integrated search

Overview

txtdot is a lightweight, self‑hosted HTTP proxy that performs server‑side page simplification. By leveraging Mozilla’s Readability.js, it extracts only the essential content—text, links, images, and tables—from a target page, strips away ads, trackers, and heavy scripts, and returns a minimalistic HTML document. The result is a significantly reduced payload that speeds up rendering on slow or limited‑bandwidth connections while preserving the original page’s structure and media. The application is built in Node.js with a focus on asynchronous I/O, making it highly performant for concurrent proxy requests.

Key Features

  • Server‑side simplification: All parsing happens on the backend, so clients receive a clean HTML page without needing JavaScript.
  • Media proxy & compression: Images are fetched, compressed with the Sharp library, and served from txtdot’s own cache, reducing bandwidth usage.
  • Client‑side rendering support: The proxy can render SPA frameworks (React, Vue, etc.) by using the embedded webder engine, which injects a lightweight runtime into the simplified page.
  • Integrated search: A built‑in SearXNG instance allows users to perform metasearch directly through the proxy.
  • Extensible plugin system: Developers can add custom behavior via plugins defined in @txtdot/sdk and published under @txtdot/plugins.

Technical Stack

LayerTechnology
RuntimeNode.js (v18+) with Express‑style routing
ParsingMozilla Readability.js (JavaScript)
Image processingSharp (Node bindings to libvips)
Search engineSearXNG (Python, Docker container)
ContainerizationDocker Compose orchestrates the proxy and SearXNG services
Database / PersistenceNone; stateless HTTP + optional in‑memory cache for images

The core code is written in TypeScript, compiled to JavaScript for production. The plugin SDK exposes lifecycle hooks (e.g., onRequest, onResponse) and a simple API for manipulating the parsed DOM before it is sent to the client.

Core Capabilities & APIs

  • /api/parse – Returns a JSON object containing the parsed content, links, and metadata. Ideal for building custom front‑ends or integrating with other services.
  • /api/raw-html – Provides the raw HTML of the simplified page, useful for caching or further processing.
  • /get – Browser‑friendly endpoint that accepts a url query parameter and optional engine or format flags.
  • Webhooks – Plugins can register webhook handlers to react on specific events (e.g., cache miss, image fetch failure).
  • Rate limiting – Built‑in 2 requests/second limit per IP, configurable via environment variables.

Deployment & Infrastructure

txtdot is intentionally lightweight and can run on any machine that supports Node.js:

  • Docker: A single docker-compose.yml brings up the proxy and SearXNG in a few seconds. The image is built from source, so developers can inspect the Dockerfile for customizations.
  • Scalability: Because txtdot is stateless, horizontal scaling is trivial—deploy multiple instances behind a load balancer or use Kubernetes with an autoscaler.
  • Persistence: Image caching is in-memory by default; for production, a Redis or local filesystem cache can be wired via environment variables.

Integration & Extensibility

  • Plugin architecture: The @txtdot/sdk package defines a simple interface for plugins. Developers can write TypeScript modules that hook into request/response pipelines, add new parsing engines, or modify the output HTML.
  • Custom engines: While Readability.js is the default parser, plugins can register alternative parsing libraries (e.g., turndown, cheerio) to handle edge cases.
  • Webder integration: For sites that rely heavily on client‑side rendering, the embedded webder engine injects a minimal runtime to hydrate the page after simplification.
  • API consumption: The JSON endpoints are fully documented (see docs/), enabling developers to build custom dashboards, mobile clients, or integrate with other content‑crawling tools.

Developer Experience

  • Configuration: Most settings are exposed as environment variables (e.g., PORT, CACHE_TTL, RATE_LIMIT). The Docker Compose file includes sensible defaults.
  • Documentation: A comprehensive website (https://tempoworks.github.io/documentation) covers API usage, plugin development, and deployment guides.
  • Community: The project is open source under MIT license. Issue trackers are active, and the @txtdot/plugins repository hosts community‑contributed extensions.
  • Testing: The repo includes performance tests against real sites (Habr, Medium) to demonstrate bandwidth savings and load times.

Use Cases

ScenarioWhy txtdot?
Low‑bandwidth browsingReduces page size by ~70–90 %, ideal for mobile or satellite connections.
Ad‑free readingStrips out ads and trackers, providing a clean reading experience.
Content aggregationAPIs allow automated ingestion of cleaned text

Open SourceReady to get started?

Join the community and start self-hosting txtdot today

Weekly Views

Loading...
Support Us

Featured Project

$30/month

Get maximum visibility with featured placement and special badges

Repository Health

Loading health data...

Information

Category
other
License
MIT
Stars
192
Technical Specs
Pricing
Open Source
Database
None
Docker
Dockerfile
Supported OS
LinuxDocker
Author
TempoWorks
TempoWorks
Last Updated
May 4, 2025