Question about rate limits/read limits

What is the problem you are having with rclone?

We have 22 TB of production data that we want to rclone from an onprem environment to Azure cloud storage via a service principal.

We did a rclone size on the minio storage to see what it would output (curiosity) and it killed our minio instance (most likely too many files for it to list and caused it to run out of memory/overuse its available cpu)

My questions are the following:

1: Does initiating an actual copy require that rclone reads the full directory of the source? Will it cause our instance to crash again?
2: Can we limit the read speed for listing if it is required (or omit entirely)
3: Can we limit the transfer speed to azure?

Thanks, really appreciated

Run the command 'rclone version' and share the full output of the command.

rclone v1.70.2

Which cloud storage system are you using? (eg Google Drive)

Azure

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone size

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.


[blobstorage]
type = azureblob
account = XXX
env_auth = true

[minio]
type = s3
provider = minio
access_key_id = XXX
secret_access_key = XXX
endpoint = https://minio.meddbase.com:9000

A log from the command that you were trying to run with the -vv flag

More of a question than a log query

welcome to the forum,

--bwlimit


might try to --disable ListR to disable fast list


well, rclone did not kill minio. in fact, minio killed itself from doing a simple list of files.
should contact meddbase.com and complain to them.

And how do you think rclone can determine what to transfer?:slight_smile:

Only you can answer this question. It has nothing to do with rclone...

To throttle rclone you can limit bandwidth (as already mentioned) and/or number of any transactions --tpslimit.

Thats a fair point. With the TPSLimit is rclone "Listing" the source folder a transaction? is each file classified as a transaction in that sense?

That may be the best way forward. Is there anyway to tell from the logs what speed it was going at (before it crashed out instance)?

You would have to look at S3 protocol details. Every API call is a transaction.

If you do not know what is max API calls limit of your minio then what you have to do is to experiment a bit.

On the other hand you should seriously consider looking into your minio configuration. If simple listing can bring it down then I would assume there is something fundamentally wrong with the way how it is setup. Minio is well established and mature project used for pretty much any scale deployments. But setting it up correctly requires some work.

It was setup by someone who has now left, One of the reasons for moving the data is to setup a new minio instance.

The Data repository for Minio is effectively one large 22 TB storage for a huge volume of files. If it was a few small large files listing wouldnt be an issue. but the issue is the large volume of small files.

Which is why i was asking if we can limit rclones "listing" speed

can you break the migration into parts based on directories? using filters, for example.
rclone size src: --include-from=include.lst
rclone copy src: dst: --include-from=include.lst

contents of include.lst

/dir01/**
/dir02/**

another option might be to get a list of files from minio and feed that to rclone.

Unfortunately there are no subfolders. its 22 TB in one large folder

Does a rclone need to list the whole directory before starting transactions? or does it scan files one by one to do a transaction?

You can’t scan one by one. You have to know what to scan. Rclone lists files per folder bases so in your case it means everything has to be read.

Alternatively you can try what @asdffdsa suggested. Get list of files using some other means and feed it to rclone. Probably good idea to do this in batches.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.