Long delay when copying millions of tiny objects

nkemnitz · December 4, 2021, 4:20pm

So in summary: Rclone will first list the entire content of the source "leaf" directory before initiating the transfer. If this is the only directory that needs to be transferred and contains several million objects, there will be a delay equal to the time it takes for listing the entire source directory, which can accumulate to several hours for millions of objects. I created a feature request on Github.

For now, I ended up with:

rclone lsf --absolute Cloudian:/bucket/prefixC/ > list_of_object_names.txt - for 90M objects, the result is a 4 GiB text file
Split file into 90 files with 1M lines/objects each
For loop around rclone copy --checksum --s3-no-head --s3-no-head-object --no-traverse --no-check-dest --from-files-raw list_of_object_files_part00.txt for each of the 90 files

Like @Ole noted, trying to do it in a single operation did not work well, either. I stopped it when rclone was at 24 GB memory consumption and no transfer had started.

This way, I still have to list the entire source directory, but at least if something unexpected happens during the transfer and rclone or the node crashes, I can skip the 3 hour long list operation step.

Edit: Thanks @asdffdsa for the --absolute parameter - I added it to the steps.