Rclone taking long time to start the copy when there are billions of files in the source

What is the problem you are having with rclone?

I have lots of files in my S3 bucket(Billions of them total size 50TB) I am using rclone copy command to copy them to a PVC. When I am running the command it is not transferring this immediately but scanning all the files in the source which is taking lot of time. I have given the option --no-traverse still the copy did not start. Is there any option in rclone to start the copy immidiately while the source listing is going on?

Run the command 'rclone version' and share the full output of the command.

Version 1.69.1

Which cloud storage system are you using? (eg Google Drive)

Aws S3

Rclone command

rclone --config /tmp/rclone.conf copy -v --stats=1s --no-traverse --transfers $CONCURRENCY $DRY_RUN_FLAG "${SOURCE}" "${DESTINATION}"

Rclone config

[source]

type = s3
provider = AWS
env_auth = true
profile = source
region = us-east-2

A log from the command that you were trying to run with the -vv flag

0B / 0B , -, 0B/s, ETA -

This flag applies to destination only.

Rclone syncs on a directory by directory basis so if you have many directories with small number of files then transfer will start much faster than if all files are in one directory.

In addition you need enough RAM to store all files data (in a given directory) - 100 million files require about 100GB of RAM. You can try the latest rclone beta (v1.70) which handles it much better (see more details here).

You can improve speed by using --size-only or the --checksum flags and the sync should start much quicker. But it means that files modtime will be ignored.

I went through the rclone code and it is necessary to list all the files to copy the data, either using --include-from flag or specifying the file names explicitly or without it. Either way it takes lot of time. Can we stream the file list and start the copy immediately in any way?

might test --max-backlog

I tried setting different values from 20 to 20000000 for --max-backlog It still has same behavior.

@ncw do you believe you can help me in this case?