We have 22 TB of production data that we want to rclone from an onprem environment to Azure cloud storage via a service principal.
We did a rclone size on the minio storage to see what it would output (curiosity) and it killed our minio instance (most likely too many files for it to list and caused it to run out of memory/overuse its available cpu)
My questions are the following:
1: Does initiating an actual copy require that rclone reads the full directory of the source? Will it cause our instance to crash again?
2: Can we limit the read speed for listing if it is required (or omit entirely)
3: Can we limit the transfer speed to azure?
Thanks, really appreciated
Run the command 'rclone version' and share the full output of the command.
rclone v1.70.2
Which cloud storage system are you using? (eg Google Drive)
Azure
The command you were trying to run (eg rclone copy /tmp remote:tmp)
rclone size
Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.
You would have to look at S3 protocol details. Every API call is a transaction.
If you do not know what is max API calls limit of your minio then what you have to do is to experiment a bit.
On the other hand you should seriously consider looking into your minio configuration. If simple listing can bring it down then I would assume there is something fundamentally wrong with the way how it is setup. Minio is well established and mature project used for pretty much any scale deployments. But setting it up correctly requires some work.
It was setup by someone who has now left, One of the reasons for moving the data is to setup a new minio instance.
The Data repository for Minio is effectively one large 22 TB storage for a huge volume of files. If it was a few small large files listing wouldnt be an issue. but the issue is the large volume of small files.
Which is why i was asking if we can limit rclones "listing" speed
can you break the migration into parts based on directories? using filters, for example. rclone size src: --include-from=include.lst rclone copy src: dst: --include-from=include.lst
contents of include.lst
/dir01/**
/dir02/**
another option might be to get a list of files from minio and feed that to rclone.
You can’t scan one by one. You have to know what to scan. Rclone lists files per folder bases so in your case it means everything has to be read.
Alternatively you can try what @asdffdsa suggested. Get list of files using some other means and feed it to rclone. Probably good idea to do this in batches.