Folder with millions of files

ncw · November 9, 2018, 11:09am

6 million files in one directory? That is a lot! However it should work provided your computer has enough memory.

However rclone fetches directories in 1000 file chunks (API limitations) and it has to fetch the entire directory before it starts the sync, so it will take ~6,000 http transactions before that directory starts listing. rclone doesn't list the directory in parallel (s3 doesn't support that alas) so you need to wait for the 6000 http transactions to happen.

You want to use --checksum or --size-only to avoid metadata reads on the individual objects.

I'm not sure --fast-list will really help here.

--transfers set larger will help when you get transferring files, try --transfers 64.

I think doing this but waiting longer is probably your best bet.

Where are you doing this transfer from? You want to try to get the time of that http list transaction as low as possible - maybe try from ec2?

One thing you could try is list the bucket to a file. This will likely take a very long time, but you will see output immediately.

rclone lsf --files-only s3:bucket > file-list

Then use this as an input to a sync

rclone copy --checksum --files-from file-list s3:bucket wasabi:bucket

You will need to use the latest beta for this and then it will not list the directory and only transfer the files listed. It will do metadata reads for each file though, but that is as part of the transfer.

That will take the same time or longer than the first approach, but does have the advantage that it is easy to restart.

Are you planning on doing this sync regularly? Or is it a one off?