What is the size of your storage backends and level of traffic endured by rclone?

Marc_Riera · June 10, 2022, 4:05pm

Dear all,

Apologies for the unorthodox approach.

Rclone has appeared in our radar as a solution that could solve many of the problems that we are currently experiencing in our institution, the documentation and quality of the code is so good that I don't understand why I did not know about this solution before, so that is why I'm here, to ask if I'm missing something.

Our institution provides access to open data to any researchers in the world, for example you can go to http://ftp.ebi.ac.uk/ and as you click the links around you have access to a total of no less than ~40 Petabytes of data. Petabytes, that is not a typo.

So, we usually see more than 5 PB of downloads every month, through several protocols that we support, and I was planning to retire some of them, however the scientific community wants to keep their pipelines untouched and that would be a cost that if I can I'd like to skip.

Also, we have several kinds of backends:

lustre
netapp
isilon
weka
in house S3 implementation with HGST(activescale X100) backend

So, the protocols I wanted to retire are supported by rclone, and in fact it appears that we could replace existing services handled by vsftpd, httpd, nginx with rclone. That is why I have to ask the following question :

Can you (members of the rclone community) share with us (the possible next member of the community) answers to the following list of questions?

what are the sizes of your deployments?
How do you scale rclone in the real life?
What problems have you had and how easy was to solve them?
How much traffic does your rclone instance handle?

Thank you.

Best regards
Marc

ncw · June 10, 2022, 10:46pm

Rclone is a very efficient server and it will generally max out the network before you run out of CPU.

The busiest server I have is beta.rclone.org which serves about 50TB of data a month serving from an openstack swift cluster. That doesn't tax the server at all. I know others run much bigger servers.

Scaling the servers horizontally shouldn't be a problem for some protocols. For FTP you'd need a connection tracking load balancer, and given that I don't think any of the protocols would be a problem.

I offer rclone related consultantcy if you'd like more help and you've got the budget. Rclone has been quite popular in the academic world and I've done a couple of projects there.

Marc_Riera · June 11, 2022, 9:40pm

Dear Nick,
Thanks for your answer.

The FTP load balancing requirements is something that we already address with our current vsftpd deployment on k8s, so it should not be a problem.

The rest of the protocols, specially the rsync to my surprise, are used with anger in the academic world so our team will invest time testing rclone as a replacement of our current setup and if such tests are successful I'll investigate if the consultancy is something we can do, at least to validate the setup before we go live.
Would be nice to see the opinion of other Petabyte scale users, since from TB to PB usually there is a stretch, but it looks quite promising by itself.

Best regards
Marc

random404 · June 13, 2022, 4:47pm

I do 200gbps+ with rclone just fine. But I scale horizontally against a glusterfs backend

Marc_Riera · June 20, 2022, 8:23am

Dear @random404 ,

Could you share some of your problems or issues or interesting configuration bits with us?

Thank you very much .

Best regards
Marc

system · July 20, 2022, 8:23am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.