MCPSERV.CLUB
Docspell

Docspell

Self-Hosted

Personal document organizer with AI‑powered tagging and OCR

Active(100)
1.9kstars
0views
Updated 7 days ago
Docspell screenshot 1
1 / 5

Overview

Discover what makes Docspell powerful

Docspell is a fully self‑hosted Document Management System (DMS) aimed at small teams, families or single users who need a robust way to ingest, classify and retrieve digital documents. From a developer standpoint it is a micro‑service architecture written in **Scala 3** that exposes a well‑documented REST/HTTP API and a lightweight single‑page application (SPA) built with **React**. The core engine runs on top of a PostgreSQL database, while optional components such as the OCR worker or the mail ingestion service are implemented as separate Docker images that communicate via HTTP.

Language & Frameworks

Database

Containerization

External Services

Overview

Docspell is a fully self‑hosted Document Management System (DMS) aimed at small teams, families or single users who need a robust way to ingest, classify and retrieve digital documents. From a developer standpoint it is a micro‑service architecture written in Scala 3 that exposes a well‑documented REST/HTTP API and a lightweight single‑page application (SPA) built with React. The core engine runs on top of a PostgreSQL database, while optional components such as the OCR worker or the mail ingestion service are implemented as separate Docker images that communicate via HTTP.

Technical Stack & Architecture

  • Language & Frameworks: The back‑end is a pure functional Scala codebase using the Cats and ZIO libraries for effect management. Routing is handled by Http4s, while persistence uses Doobie for type‑safe JDBC access. The SPA is a standard React application bundled with Vite, consuming the same REST API used by the CLI (dsc) and Android client.
  • Database: PostgreSQL is the sole data store, storing both metadata (tags, correspondents, custom fields) and binary blobs in a dedicated bytea column. Schema migrations are managed with Flyway.
  • Containerization: Each component (REST server, mail worker, OCR worker, web UI) is published as a separate Docker image (docspell/restserver, docspell/mailworker, etc.). A single docker‑compose file is provided for quick prototyping, but the architecture can be split across multiple hosts or orchestrated with Kubernetes.
  • External Services: OCR and NLP are powered by the open‑source Stanford CoreNLP library, which is bundled as a JAR and invoked via an internal HTTP endpoint. The mail ingestion component supports IMAP/POP3 and can forward parsed emails to a user‑defined webhook.

Core Capabilities & APIs

  • Metadata Extraction: Automatic tagging, date extraction, and correspondent suggestion are exposed as REST endpoints (/api/v1/items/{id}/metadata). Developers can trigger a re‑run of the NLP pipeline on demand.
  • Search & Retrieval: Full‑text search is backed by PostgreSQL’s tsvector and can be queried via /api/v1/search?q=…. The API also supports pagination, sorting and filtering by custom fields.
  • Event Hooks: Webhooks can be registered for lifecycle events (item added, updated, deleted). The event payloads are JSON and can be consumed by external systems for synchronization or analytics.
  • Extensibility: The core is designed to be extended through plugins. A plugin is simply a JAR that implements the docspell.api.Plugin trait; it can add new REST endpoints, alter ingestion pipelines or inject custom UI components via a simple registration API.

Deployment & Infrastructure

Docspell’s modular design makes it suitable for both single‑box deployments and scalable clusters. The REST server can be load‑balanced behind a reverse proxy (NGINX/Traefik) while the PostgreSQL instance can be replicated for high availability. The OCR worker is stateless and horizontally scalable; each worker pulls jobs from a Redis queue (configured via environment variables). The Docker images are lightweight (~200 MB) and support multi‑arch builds, making them ideal for ARM devices such as Raspberry Pi.

Integration & Extensibility

  • SDKs: While no official SDK exists, the public REST API is fully documented with OpenAPI annotations. Third‑party clients can be generated in any language using the published spec.
  • CLI & Android: The dsc command‑line tool and the Android app demonstrate how to interact programmatically with the API, providing patterns for authentication (OAuth2) and file uploads.
  • Custom Fields: Developers can define arbitrary key‑value pairs per item, and expose them through the UI or API. This allows integration with external taxonomy systems without modifying the core schema.

Developer Experience

The project’s source code is hosted on GitHub with a clear directory layout (api, restserver, mailworker). Documentation is split across Markdown files and a live website (docspell.org), covering installation, API reference, plugin development and advanced configuration. The community is active on Gitter, and issues are triaged quickly thanks to the Scala Steward badge that keeps dependencies up‑to‑date. Licensing is GPLv3, which may be a consideration for commercial deployments but guarantees that any modifications remain open.

Use Cases

  • Home / Family: A single Raspberry Pi running the Docker stack can act as a personal “paper‑to‑digital” hub, automatically scanning receipts, bills and school reports.
  • Small Business: Teams can use Docspell as a lightweight contract repository, leveraging the automatic tag extraction to classify contracts by client and project.
  • Enterprise Integration: Through its webhook system, Docspell can feed a corporate knowledge base or trigger downstream workflows (e.g., sending extracted data to a CRM).

Advantages Over Alternatives

  • Performance: The Scala/ZIO stack provides low‑latency request handling and efficient background job processing, outperforming many JavaScript‑only DMS solutions.
  • Flexibility: The plugin architecture and open REST API allow developers to tailor ingestion pipelines, integrate with proprietary NLP models or expose custom UI widgets.
  • Licensing: GPLv3 ensures that any derived work remains free, which is attractive for open‑source projects and educational environments.
  • **Self

Open SourceReady to get started?

Join the community and start self-hosting Docspell today

Weekly Views

Loading...
Support Us
Most Popular

Infrastructure Supporter

$5/month

Keep our servers running and help us maintain the best directory for developers

Repository Health

Loading health data...

Information

Category
other
License
AGPL-3.0
Stars
1.9k
Technical Specs
Pricing
Open Source
Database
PostgreSQL
Docker
Official
Supported OS
LinuxDocker
Author
eikek
eikek
Last Updated
7 days ago