ArchivesSpace

Self-Hosted

Open‑source archives management for archivists

Active(95)

387stars

1views

Updated 1 day ago

Overview

Discover what makes ArchivesSpace powerful

ArchivesSpace is an open‑source, web‑based archives information management system (IMS) that consolidates the full lifecycle of archival collections—from accessioning and arrangement to description, preservation, and public access—into a single, extensible platform. Designed by archivists for archivists, it exposes rich data models through a RESTful API and a well‑documented XML schema, enabling developers to integrate archival workflows with external systems such as discovery layers, digital repositories, and workflow automation tools. The application is released under the Educational Community License v2.0, ensuring that institutions can deploy and customize it without licensing overhead while still benefiting from a vibrant contributor community.

Core Runtime

Persistence

Web Front‑End

Search & Indexing

Overview

Technical Stack & Architecture

Core Runtime: Java 11+ with the Spring Framework (Spring MVC, Spring Data JPA) powering a stateless web service layer.
Persistence: PostgreSQL 12+ as the primary data store, with a normalized schema that maps directly to archival concepts (e.g., resource, item, accession). The database layer is accessed via Hibernate/JPA, allowing developers to write repository interfaces that are automatically translated into SQL.
Web Front‑End: A lightweight, server‑rendered UI built with Thymeleaf and jQuery. The front end communicates with the back end through REST endpoints, returning JSON for API clients and HTML fragments for interactive pages.
Search & Indexing: Integrated Apache Solr 8+ provides full‑text search, faceted navigation, and advanced querying. The Solr schema is tightly coupled to the archival data model, exposing controlled vocabularies and hierarchical relationships.
Containerization: Official Docker images are available for the application server, PostgreSQL, and Solr. A single docker‑compose configuration can spin up a production‑ready stack, while Kubernetes manifests are available for larger deployments.

Core Capabilities & APIs

RESTful API: Exposes CRUD operations for all archival entities (/api/v1/resources, /api/v1/items, etc.), supporting pagination, filtering, and embedded resources. The API is fully documented in Swagger/OpenAPI format and includes example requests for common use cases such as batch accessioning or metadata export.
XML Export/Import: Implements the ArchivesSpace XML (ASX) format, a standardized archival description schema that can be exchanged with other IMS products or fed into digital repositories.
Webhooks & Event Hooks: Developers can subscribe to lifecycle events (e.g., resource.created, item.updated) via HTTP callbacks, enabling real‑time integration with external services.
Plugin Architecture: Custom modules can be added as OSGi bundles or plain JARs, allowing new UI widgets, API endpoints, or background jobs without modifying core code.
Authentication & Authorization: Supports LDAP/Active Directory integration, OAuth2 clients, and fine‑grained role‑based access control (RBAC) defined in the database.

Deployment & Infrastructure

Self‑Hosting: ArchivesSpace is designed for on‑premises deployment. It requires a Java runtime, PostgreSQL, and Solr; all components can be provisioned on virtual machines or bare metal.
Scalability: Horizontal scaling is achieved by load‑balancing multiple application instances behind a reverse proxy (e.g., Nginx) and replicating the PostgreSQL cluster with logical replication. Solr can be scaled via sharding or using a managed SolrCloud deployment.
Backup & Disaster Recovery: Built‑in database backup utilities and Solr snapshot mechanisms are available. The application can be restored from a Docker volume or a full backup set within minutes, making it suitable for institutional data preservation strategies.
CI/CD & Testing: The project includes a GitHub Actions pipeline that runs unit tests, integration tests against a test PostgreSQL/ Solr instance, and builds Docker images. Developers can fork the repo, run tests locally, and contribute via pull requests.

Integration & Extensibility

External Systems: Common integrations include Koha (library catalog), Fedora Commons (digital repository), and OpenSearch-based discovery layers. The REST API and webhook system make it trivial to sync metadata or trigger ingestion pipelines.
Custom Workflows: Using the plugin system, developers can add custom approval workflows, automated metadata enrichment (e.g., via external authority services), or UI widgets for specific institutional needs.
Community & Support: A dedicated Atlassian Confluence space hosts user and developer documentation, API references, and a knowledge base. The issue tracker on Atlassian is used for bug reports and feature requests, ensuring rapid community response.

Use Cases

Institutional Archives: A university library can deploy ArchivesSpace to manage thousands of manuscript collections, expose metadata through a public portal, and integrate with their existing discovery system.
Digital Preservation Projects: A heritage organization can use the ASX export to feed metadata into a digital repository, while leveraging Solr for advanced search across both physical and digital assets.
Workflow Automation: A research institute can hook into the accession.created event to automatically trigger image capture workflows, ingesting scans into a digital asset management system.

Advantages for Developers

Open Source & No Vendor Lock‑In: Full source code access allows deep customization and ensures that institutions are not tied to proprietary solutions.
Standard‑Based Data Model: Alignment with archival standards (e.g., EAD, ISAD(G)) reduces the learning curve for developers familiar with archival