Zenodo and Invenio as a provider

Hi I work on a project called Renku (GitHub - SwissDataScienceCenter/renku: Renku provides a platform and tools for reproducible and collaborative data analysis.). We are building a platform to enable people to work on data science projects and research in general. We have been using rclone to enable our users to mount data from many cloud providers in their data science projects.

We have a lot of users that publish their research datasets on Zenodo (https://zenodo.org/). Since Zenodo is just a large data repository it would be really useful if we could access it the same way as all other data. So I was wondering if you would be open for me and some other developers on the project to contribute a Zenodo provider to rclone? Is this something you would be interested in?

Zenodo uses invenio (a research data management system developed by CERN) to store and manage its data repositories. So I think that with the work we plan on contributing we could have a zenodo provider and a generic invenio provider that would share quite a bit of logic/code. Invenio is used by even more academic or similar institutions that host research data and it would allow a lot of users who work in research and academia to use rclone.

1 Like

There is a limit of 2 links per post. So here is the link to the invenio website: About Invenio — inveniosoftware.org

Hi Tasko,

Welcome to the rclone forum!

We would be more than happy for you to contribute a new Zenodo and Invenio provider.

There is a guide on how to approach this on our github (writing a new backend).

This is also something Rclone Services Ltd would be happy to help with if you are in need of some professional trouble shooting or help (or we could even potentially write it for you). If this is interesting to you or you just want to say hi, drop Nick and myself a message at info@rclone.com.

Thanks!

1 Like

That is great @Edcw. I will reach out when we start implementing. And I will let you know if we need your support.

1 Like

Hi there :wave:
We have started the implementation of a doi backend for rclone, with initial support for Zenodo as a provider.

The Renku fork of rclone: GitHub - SwissDataScienceCenter/rclone: "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files

Example:
If we take this dataset: FSD50K
We can write the following config for rclone:

[FSD50K]
type = doi
doi = 10.5281/zenodo.4060431

And then we can access the dataset:

$ rclone lsl FSD50K:
3221225472 2025-01-19 19:25:20.000000000 FSD50K.dev_audio.z01
3221225472 2025-01-19 17:25:00.000000000 FSD50K.dev_audio.z02
3221225472 2025-01-19 17:24:56.000000000 FSD50K.dev_audio.z03
3221225472 2025-01-19 17:24:56.000000000 FSD50K.dev_audio.z04
3221225472 2025-01-19 17:24:55.000000000 FSD50K.dev_audio.z05
2306663327 2025-01-19 17:24:55.000000000 FSD50K.dev_audio.zip
     6984 2025-01-19 17:24:55.000000000 FSD50K.doc.zip
3221225472 2025-01-19 17:24:55.000000000 FSD50K.eval_audio.z01
3037675767 2025-01-19 17:24:55.000000000 FSD50K.eval_audio.zip
   334701 2025-01-19 17:24:55.000000000 FSD50K.ground_truth.zip
  6700838 2025-01-19 17:24:55.000000000 FSD50K.metadata.zip

Note: at the time of writing, Zenodo is currently down, so we could not print the actual output of the lsl command.

We are also exploring supporting more providers of published datasets, e.g. dataverse :

[crimp]
type = doi
doi = 10.7910/DVN/WBDKN6
$ rclone lsl crimp:
149289365 2025-03-25 11:02:47.000000000 crimp_force_curves_dataset.pkl
2 Likes

I have opened a PR here -> doi: add new doi backend by leafty · Pull Request #8510 · rclone/rclone · GitHub

2 Likes