Overview
Discover what makes InvenioRDM powerful
InvenioRDM is a full‑stack, open‑source platform for building institutional repositories and digital libraries. At its core it exposes a RESTful API that manages **records** (datasets, publications, code) and **communities** (groups of users with shared metadata schemas). The system is built on top of the Invenio framework, which provides a modular architecture for authentication, workflow, search, and data ingestion. Developers interact with InvenioRDM through well‑documented Python APIs, a GraphQL layer (optional), and an extensible plugin system that allows custom modules to hook into the lifecycle of records, metadata validation, and harvesting.
Language & Framework
Search & Indexing
Databases
Containerization
Overview
InvenioRDM is a full‑stack, open‑source platform for building institutional repositories and digital libraries. At its core it exposes a RESTful API that manages records (datasets, publications, code) and communities (groups of users with shared metadata schemas). The system is built on top of the Invenio framework, which provides a modular architecture for authentication, workflow, search, and data ingestion. Developers interact with InvenioRDM through well‑documented Python APIs, a GraphQL layer (optional), and an extensible plugin system that allows custom modules to hook into the lifecycle of records, metadata validation, and harvesting.
Technical Stack
- Language & Framework: Python 3.11+, Flask for the web layer, Celery for asynchronous tasks, and SQLAlchemy as the ORM.
- Search & Indexing: Elasticsearch 8.x powers full‑text search, faceting, and advanced query syntax.
- Databases: PostgreSQL 15+ stores relational data (users, records, communities), while Redis is used for caching and Celery broker.
- Containerization: Official Docker images are provided; the project ships a
docker-compose.ymlthat bundles all services (web, worker, search, db). - CI/CD & Testing: GitHub Actions orchestrate linting, unit tests, and integration tests across Python, Docker, and E2E test suites.
Core Capabilities
- REST API: CRUD operations on records and communities, batch ingestion via JSON‑LD, OAI‑PMH harvesting endpoints.
- Metadata Model: Extensible JSON‑LD schemas; developers can define custom vocabularies, controlled terms, and validation rules via YAML files or Python code.
- Workflow Engine: Declarative workflow definitions (e.g., “review → publish”) that trigger Celery tasks and send notifications.
- Authentication & Authorization: Supports OAuth2, SAML, LDAP, and JSON Web Tokens. Fine‑grained role‑based access control (RBAC) is exposed via the API.
- Search & Facets: Elasticsearch integration provides faceted navigation, aggregations, and custom analyzers (e.g., stemming, synonym filters).
- Harvesting & Export: OAI‑PMH, Crossref DOI minting, and metadata export in multiple formats (XML, JSON‑LD, MARC21).
Deployment & Infrastructure
InvenioRDM is designed for self‑hosting on a variety of infrastructures:
- Bare metal / VMs: A single machine can run the web, worker, and search nodes; for production, horizontal scaling is achieved by adding more workers or Elasticsearch shards.
- Kubernetes: Helm charts are available; the platform can be deployed as a set of stateless pods with persistent volumes for PostgreSQL and Elasticsearch.
- CI/CD Pipelines: The Docker images can be pulled into any CI system; environment variables control configuration (e.g.,
INVENIO_SECRET_KEY,ELASTICSEARCH_URL). - High Availability: Elasticsearch clusters, PostgreSQL replication, and Celery workers can be replicated to achieve fault tolerance.
Integration & Extensibility
- Plugin System: Any Python package that follows the Invenio plugin contract can be installed via
pipand registered insetup.cfg. Plugins may add new UI components, API endpoints, or modify existing workflows. - Webhooks & Callbacks: External services can subscribe to events (record created, community updated) via HTTP callbacks.
- Custom Code: Developers can override templates, add new blueprints, or extend the
InvenioRDMclass to inject domain logic. - Third‑Party Integrations: Built‑in connectors for GitHub, Zenodo, and ORCID; easy to add others by implementing the
Invenio-APIinterface.
Developer Experience
The project follows a well‑structured documentation model: user guides, API references, and developer tutorials are available on the official site. The codebase is split into modular packages (invenio-records, invenio-communities) that can be developed independently. Community support is active on GitHub, Discord, and the mailing list; a dedicated PR board tracks feature requests. The licensing (BSD‑3) allows commercial deployment without royalties, which is attractive for institutional developers.
Use Cases
- Academic Libraries: Host a searchable repository of theses, datasets, and publications with fine‑grained access control.
- Research Groups: Manage collaborative projects, share code and data, and publish metadata to external services (e.g., Crossref).
- Open Science Platforms: Expose OAI‑PMH endpoints for harvesting by other repositories; mint DOIs via Zenodo integration.
- Custom Digital Archives: Extend the metadata model to include domain‑specific vocabularies (e.g., museum collections) and deploy on Kubernetes for scalability.
Advantages
- Performance: Elasticsearch + PostgreSQL provide fast search and robust transactional guarantees.
- Flexibility: Declarative workflows, extensible metadata schemas, and a plugin architecture allow tailoring to any domain.
- Licensing: BSD‑3 license removes barriers for commercial use and internal deployment.
- Community & Maintenance: Active development ensures timely security patches, continuous integration, and backward compatibility.
In short, InvenioRDM offers a battle‑tested, modular stack that lets developers build feature‑rich, scalable repositories while keeping full control over data, workflows
Open SourceReady to get started?
Join the community and start self-hosting InvenioRDM today
Related Apps in development-tools
Hoppscotch
Fast, lightweight API development tool
code-server
Self-hosted development-tools
AppFlowy
AI-powered workspace for notes, projects, and wikis
Appwrite
All-in-one backend platform for modern apps
PocketBase
Lightweight Go backend in a single file
Gitea
Fast, lightweight self-hosted Git platform
Weekly Views
Repository Health
Information
Explore More Apps
Datasette
Explore and publish data as interactive websites
OwnTracks Recorder
Lightweight MQTT/HTTP location data logger for OwnTracks devices
Vvveb CMS
Drag‑and‑drop CMS for websites, blogs, and eCommerce
File Browser
Self-hosted file manager for your server
DragonFly
Self-hosted apis-services
BigBlueButton
Open‑source virtual classroom for engaging remote learning