Motuz and Rclone on AWS blog today

Over the last few months we worked with AWS to create a blog post that discusses how we work with Motuz, the multi-user cloud data mover that uses rclone under the hood. There is also a shoutout to rclone in the blog.
Motuz is basically the result of discussions with several vendors who implemented their own cloud copy code. Why are you doing this? There is rclone. It works, it is fast, it is supported and it has already moved many many petabytes to cloud. What more do you want?

1 Like

Nice one :slight_smile: Thanks for showing this!

I stuck a link on twitter: https://twitter.com/njcw/status/1250824091260653569

Let me know if you need anything from rclone to make your life easier.

Please stick Motuz on this page: https://github.com/rclone/rclone/wiki/Third-Party-Integrations-with-rclone

Also if you need more specialist help, note that I'm available for consultancy projects :slight_smile:

Awesome, thanks Nick.

I added Motuz to "Third Party Integrations" and will keep the consultancy offer in mind. Motuz has been weird in a way that you don't really desperately need it but once you have it you don't want to give it up any more. Our users love it and we are committed to support it.
We discussed commercializing Motuz as a web service for cloud to cloud movements but it would require lots of redesign. There are some commercial services in this space such as https://mover.io/ (recently acquired by MS) and none of them use rclone and most of them are slower which is funny.
We have not really had any issues with rclone. It continues to run many processes at High Performance Computing Centers around the world , for example the Broad institute in Boston moved many petabytes to the google cloud using rclone.

Motuz looks like a great product :slight_smile:

Does it use the remote control of rclone to transfer things? If so it occurs to me that you could wrap it up into an alternative web ui for rclone. I don't know if you've investigated the web ui, but it is pluggable - rclone can open a browser for you on some static html/js which depending on the architecture of motuz

I just had a deeper look at the source and it looks like you are using a database and message queue so I don't think that approach would work.

What do you think is missing from the current product? Stuff around user authentication probably!

:slight_smile:

Nice one! I like hearing stories like this :slight_smile:

1 Like

Yes, rclone in it's current form is an application that is launched by a single user who is then using it their own security context (with their own permissions) . Motuz is a set of containers and some of them run as root to facilitate a multi-user environment. For example, you need to demote the security if you run something like "ls" on behalf of the user to name just one of the complexities. Also the queuing system (celery with MQ) is needed so that users can logout after they submit their copy job as large jobs can take days to complete even if you have a 10G internet connection like we do. The database is mostly used as a profile store for user settings, last used folders and previous copy jobs but also credentials. For example we use an encrypted postgres db in AWS RDS for this.

The current implementation is geared towards deployments in enterprises and HPC centers that work with a central posix file system and can login via pam so often a typical use case of a login node of an HPC cluster.
If we wanted to offer this as a service it would have to be a more distributed application. The main UI would be in a scalable containerized service in the cloud and then you would have optional distributed worker nodes in each of the important clouds as well as on prem dependent on your priorities. If you are moving often from AWS to Google it may not make sense to pipe it through on prem. Authentication is of course another factor. If you don't have a posix file system at all you could just use Google or any other auth to login and then manage your other cloud creds through the application

Raw genomes are very silly datasets often consisting of 50k files (basecalls) and several 100GB. A parallel copy with rclone makes that less of a torture.

One of the unexpected benefits was that people who were using rsync noticed that rclone is much faster even if it does not do permissions as well so we have now several people using Motuz to copy data from one folder to another folder in the same posix file system. We realize that duplicating data in the same posix file system is not the best approach but we also have de-duplication so it does at least not waste any space.

Nice writeup - thanks!

Quite a lot of rclone users do this too (me included!).

As for the permissions, at some point I intend to fix that but it is still on the todo list!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.