I am seeing a long idle time between submitting the copy command and start of the actual transfer. Source and Target are two different S3-compatible Cloud Object Storages. There are millions of tiny (few KB) keys with the specified source prefix. The destination bucket also contains millions of objects, but none share the prefix that I am trying to copy.
about the source, what is the name of the host provider?
the source provider might impose some sort of api throttling or other such limits..
as a test, might run this and see how long it takes and what is inside the rclone.log debug file rclone ls source: --fast-list --log-level=DEBUG --log-file=rclone.log
The entire log content with --log-level=DEBUG and --fast-list after 28 minutes:
2021/12/03 13:00:41 DEBUG : rclone: Version "v1.57.0-DEV" starting with parameters ["rclone" "copy" "--fast-list" "--s3-no-check-bucket" "--checksum" "--s3-no-head" "--s3-no-head-object" "--no-check-dest" "--no-traverse" "--transfers" "200" "--checkers" "200" "--retries" "1" "--progress" "source:/bucket/prefixC/" "destination:/bucket/prefixC/" "--low-level-retries" "3" "--timeout" "10s" "--contimeout" "10s" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/12/03 13:00:41 DEBUG : Creating backend with remote "source:/bucket/prefixC/"
2021/12/03 13:00:41 DEBUG : Using config file from "/usr/people/nkemnitz/.config/rclone/rclone.conf"
2021/12/03 13:00:41 DEBUG : source: detected overridden config - adding "{ECHgI}" suffix to name
2021/12/03 13:00:41 DEBUG : fs cache: renaming cache item "source:/bucket/prefixC/" to be canonical "source{ECHgI}:bucket/prefixC"
2021/12/03 13:00:41 DEBUG : Creating backend with remote "destination:/bucket/prefixC/"
2021/12/03 13:00:41 DEBUG : destination: detected overridden config - adding "{ECHgI}" suffix to name
2021/12/03 13:00:41 DEBUG : fs cache: renaming cache item "destination:/bucket/prefixC/" to be canonical "destination{ECHgI}:bucket/prefixC"
Source is Cloudian (on-premise). There are no artificially imposed limits on the source. Based on previous cosbench results, the source should be capable of ~10k GET op/s, the destination of ~4k PUT op/s using those tiny objects.
I can see that bandwidth increases by ~30 Mbit/s during this time. So something is happening... I don't see a difference with/without --fast-list, but just to be sure: --fast-list is attempting to fit all the 90 million object headers into memory before attempting to upload the first file, correct? And without it rclone should start the transfer after the first batch of (probably 1000) objects that the list operation returns?
ok, there are only so many reasons for the initial slow down, need to eliminate the basics reasons first.
is the issue the source, the dest or both?
so i would run a simple list command on the source and dest, such as rclone ls with and without --fast-list with a debug log and see what kind of delay there is.
to get a deeper look --dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log
not sure about your use case, do you plan to run the rclone copy on a schedule or is this a one-time copy?
sometimes, to reduce the api calls, can use --max-age
you are using so many flags, not sure how they all interact.
i like to use the simplest command possible, using defaults and get a base line performance and add flags one at a time, test and repeat.
Ooh, thank you for the --dump=headers suggestion. I can see that the logs just contain the list requests / responses from the source https://pastebin.com/pcxURhWh, e.g.:
So I guess I misunderstood rclone's default behavior and the purpose of --fast-list: I thought that is the one that controls whether or not rclone will first collect the complete list of all source objects before starting to transfer the first objects.
Is there maybe another flag I missed that does that? I.e. let rclone collect object names in the background (up to 1000 per list request) and transfer the first few objects from the queue to keep the destination busy, as well?
If not, I will split it into two steps:
A list operation to collect all object names - for 90 million objects and ~11 list operations per second, that's a bit over 2 hours.
The same copy command that I started with, but with --files-from to avoid having rclone list the source directory again (and again in case of errors)
the most i have ever tested is rclone sync from local file system to wasabi, s3 clone known for hot storage, with 1,000,000 files.
if the source and dest match, then rclone sync /path/to/local/folder dest: --tranfers=256 --transfers=256 takes approx 36 seconds.
not sure the exact optimized command.
--- as i suggested, take a subdir with a smaller amount of files, the simplest command possible, test, add flags and test until the optimized command is achieved.
--- tho for sure i would use --checksum
--- never used it but might try https://rclone.org/docs/#max-backlog-n
rclone normally lists a directory and processes it before using more directory lists to process any subdirectories. This can be parallelised and works very quickly using the least amount of memory.
This statement implicitly says that --fast-list will list multiple directories before processing and this listing will be performed by a single --checker. It will therefore take longer time before transfers starts.
--fast-list will therefore be slower if transfers take more time than the checking. It may be faster if there is little to transfer and most of the time is spent on checking – and the increased speed from the reduced number of requests is enough to compensate for the decreased speed caused by the loss of parallelization.
Tough call far outside my area of expertise. I would probably try increasing --s3-list-chunk ???
But first I would select a smaller dataset (a subfolder or subprefix) and then perform a copy from source to local and another from local to dest: - just to find out in what end (source or destination) I may have a performance/data issue.
I am seeing the same behavior for a prefix with 50,719 objects: 7 seconds pause before the transfer starts
The pause happens because rclone collects all objects with the prefix - I see 51 list requests before the first object is downloaded
The destination does not play any role, I replaced it with the memory remote and see the same behavior
Removed all parameters that were meant to reduce number of requests/checks, so the entire call (without logging) now looks like rclone copy Cloudian:/bucket/prefixC/ Memory:/ --progress, but no change
Setting --max-backlog 1 has no effect, which is kind of surprising?
Setting --fast-list has no effect - it's a leaf directory, so nothing that can be easily parallelized by rclone
--s3-list-chunk by default already uses the maximum value (1000 objects per page). I also lowered it to 500, but it just causes rclone to send 102 list requests instead of the 51 requests
This makes good sense to me. rclone reads the entire directory content before processing the individual entries (comparing, moving into the backlog and transferring).
This is the best (and often the only) possibility in the typical rclone sync/copy situation. Your situation is special because you have a lot of small files in each folder and no need for checking/comparing to destination.
Still, you do get a very good throughput, you are scanning the folders at app. 10k files/s. This is an order of magnitude better than my OneDrive.
You may get better error resilience (smaller retries) with a series of smaller jobs at the expense of increased complexity. It is a balance as always.
You mentioned --files-from earlier. It didn’t scale well with many files in the scenarios I tried a while ago, so you may consider load testing it before moving in that direction.
So in summary: Rclone will first list the entire content of the source "leaf" directory before initiating the transfer. If this is the only directory that needs to be transferred and contains several million objects, there will be a delay equal to the time it takes for listing the entire source directory, which can accumulate to several hours for millions of objects. I created a feature request on Github.
For now, I ended up with:
rclone lsf --absolute Cloudian:/bucket/prefixC/ > list_of_object_names.txt - for 90M objects, the result is a 4 GiB text file
Split file into 90 files with 1M lines/objects each
For loop around rclone copy --checksum --s3-no-head --s3-no-head-object --no-traverse --no-check-dest --from-files-raw list_of_object_files_part00.txt for each of the 90 files
Like @Ole noted, trying to do it in a single operation did not work well, either. I stopped it when rclone was at 24 GB memory consumption and no transfer had started.
This way, I still have to list the entire source directory, but at least if something unexpected happens during the transfer and rclone or the node crashes, I can skip the 3 hour long list operation step.
Edit: Thanks @asdffdsa for the --absolute parameter - I added it to the steps.
--absolute Put a leading / in front of path names
"Note that the --absolute parameter is useful for making lists of files to pass to an rclone copy with the --files-from-raw flag."
i guess that this is a one-time copy operation, or not suitable to use --max-age
Yes, it's hopefully a one-time transfer, so unfortunately --max-age won't work in this case. I added your note about --absolute - that already reduces the manual processing a bit. Thanks!