MCPSERV.CLUB
InvenioRDM

InvenioRDM

Self-Hosted

Open-source research data repository platform

Active(95)
139stars
0views
Updated 1 day ago

Overview

Discover what makes InvenioRDM powerful

InvenioRDM is a full‑stack, open‑source platform for building institutional repositories and digital libraries. At its core it exposes a RESTful API that manages **records** (datasets, publications, code) and **communities** (groups of users with shared metadata schemas). The system is built on top of the Invenio framework, which provides a modular architecture for authentication, workflow, search, and data ingestion. Developers interact with InvenioRDM through well‑documented Python APIs, a GraphQL layer (optional), and an extensible plugin system that allows custom modules to hook into the lifecycle of records, metadata validation, and harvesting.

Language & Framework

Search & Indexing

Databases

Containerization

Overview

InvenioRDM is a full‑stack, open‑source platform for building institutional repositories and digital libraries. At its core it exposes a RESTful API that manages records (datasets, publications, code) and communities (groups of users with shared metadata schemas). The system is built on top of the Invenio framework, which provides a modular architecture for authentication, workflow, search, and data ingestion. Developers interact with InvenioRDM through well‑documented Python APIs, a GraphQL layer (optional), and an extensible plugin system that allows custom modules to hook into the lifecycle of records, metadata validation, and harvesting.

Technical Stack

  • Language & Framework: Python 3.11+, Flask for the web layer, Celery for asynchronous tasks, and SQLAlchemy as the ORM.
  • Search & Indexing: Elasticsearch 8.x powers full‑text search, faceting, and advanced query syntax.
  • Databases: PostgreSQL 15+ stores relational data (users, records, communities), while Redis is used for caching and Celery broker.
  • Containerization: Official Docker images are provided; the project ships a docker-compose.yml that bundles all services (web, worker, search, db).
  • CI/CD & Testing: GitHub Actions orchestrate linting, unit tests, and integration tests across Python, Docker, and E2E test suites.

Core Capabilities

  • REST API: CRUD operations on records and communities, batch ingestion via JSON‑LD, OAI‑PMH harvesting endpoints.
  • Metadata Model: Extensible JSON‑LD schemas; developers can define custom vocabularies, controlled terms, and validation rules via YAML files or Python code.
  • Workflow Engine: Declarative workflow definitions (e.g., “review → publish”) that trigger Celery tasks and send notifications.
  • Authentication & Authorization: Supports OAuth2, SAML, LDAP, and JSON Web Tokens. Fine‑grained role‑based access control (RBAC) is exposed via the API.
  • Search & Facets: Elasticsearch integration provides faceted navigation, aggregations, and custom analyzers (e.g., stemming, synonym filters).
  • Harvesting & Export: OAI‑PMH, Crossref DOI minting, and metadata export in multiple formats (XML, JSON‑LD, MARC21).

Deployment & Infrastructure

InvenioRDM is designed for self‑hosting on a variety of infrastructures:

  • Bare metal / VMs: A single machine can run the web, worker, and search nodes; for production, horizontal scaling is achieved by adding more workers or Elasticsearch shards.
  • Kubernetes: Helm charts are available; the platform can be deployed as a set of stateless pods with persistent volumes for PostgreSQL and Elasticsearch.
  • CI/CD Pipelines: The Docker images can be pulled into any CI system; environment variables control configuration (e.g., INVENIO_SECRET_KEY, ELASTICSEARCH_URL).
  • High Availability: Elasticsearch clusters, PostgreSQL replication, and Celery workers can be replicated to achieve fault tolerance.

Integration & Extensibility

  • Plugin System: Any Python package that follows the Invenio plugin contract can be installed via pip and registered in setup.cfg. Plugins may add new UI components, API endpoints, or modify existing workflows.
  • Webhooks & Callbacks: External services can subscribe to events (record created, community updated) via HTTP callbacks.
  • Custom Code: Developers can override templates, add new blueprints, or extend the InvenioRDM class to inject domain logic.
  • Third‑Party Integrations: Built‑in connectors for GitHub, Zenodo, and ORCID; easy to add others by implementing the Invenio-API interface.

Developer Experience

The project follows a well‑structured documentation model: user guides, API references, and developer tutorials are available on the official site. The codebase is split into modular packages (invenio-records, invenio-communities) that can be developed independently. Community support is active on GitHub, Discord, and the mailing list; a dedicated PR board tracks feature requests. The licensing (BSD‑3) allows commercial deployment without royalties, which is attractive for institutional developers.

Use Cases

  • Academic Libraries: Host a searchable repository of theses, datasets, and publications with fine‑grained access control.
  • Research Groups: Manage collaborative projects, share code and data, and publish metadata to external services (e.g., Crossref).
  • Open Science Platforms: Expose OAI‑PMH endpoints for harvesting by other repositories; mint DOIs via Zenodo integration.
  • Custom Digital Archives: Extend the metadata model to include domain‑specific vocabularies (e.g., museum collections) and deploy on Kubernetes for scalability.

Advantages

  • Performance: Elasticsearch + PostgreSQL provide fast search and robust transactional guarantees.
  • Flexibility: Declarative workflows, extensible metadata schemas, and a plugin architecture allow tailoring to any domain.
  • Licensing: BSD‑3 license removes barriers for commercial use and internal deployment.
  • Community & Maintenance: Active development ensures timely security patches, continuous integration, and backward compatibility.

In short, InvenioRDM offers a battle‑tested, modular stack that lets developers build feature‑rich, scalable repositories while keeping full control over data, workflows

Open SourceReady to get started?

Join the community and start self-hosting InvenioRDM today

Weekly Views

Loading...
Support Us
Most Popular

Infrastructure Supporter

$5/month

Keep our servers running and help us maintain the best directory for developers

Repository Health

Loading health data...

Information

Category
development-tools
License
MIT
Stars
139
Technical Specs
Pricing
Open Source
Database
PostgreSQL
Docker
Official
Min RAM
1GB
Min Storage
5GB
Supported OS
LinuxDocker
Author
inveniosoftware
inveniosoftware
Last Updated
1 day ago