Long delay when copying millions of tiny objects

nkemnitz · December 3, 2021, 5:25pm

Hi all,

I am seeing a long idle time between submitting the copy command and start of the actual transfer. Source and Target are two different S3-compatible Cloud Object Storages. There are millions of tiny (few KB) keys with the specified source prefix. The destination bucket also contains millions of objects, but none share the prefix that I am trying to copy.

Delay before rclone uploads files, a newbie question looks similar, but my objects are tiny and calculation for the downloaded keys should be fast.

To avoid any unnecessary list calls, I threw every parameter at it that sounds like it will reduce the amount of API calls:

--s3-no-check-bucket - because I know the buckets exist
--s3-no-head, to avoid HEAD requests after upload - a 200 OK response from the remote should be sufficient
--s3-no-head-object, to avoid HEAD requests before download - same headers will be in the GET response, anyway
--checksum - avoid loading modification time, probably already covered by --s3-no-head-object, but wasn't sure
--no-traverse - I know there are no existing files with the destination prefix
--no-check-dest --retries 1 - sledgehammer approach, to ensure rclone spends no time listing the destination
--low-level-retries 3 --timeout 10s --contimeout 10s - in case some corrupt files are keeping rclone busy retrying

Full command:

rclone copy --s3-no-check-bucket --s3-no-head --s3-no-head-object --checksum --no-traverse --no-check-dest --retries 1 --low-level-retries 3 --timeout 10s --contimeout 10s --transfers 200 --checkers 200  --vv --progress source:/bucket/prefixC/ destination:/bucket/prefixC/

Output (with -vv):

Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:     26m24.0s

Rclone version:
v1.57.0-DEV (conda-forge)

Config:

[source]
type = s3
provider = Other
access_key_id = ***************
secret_access_key = ***************
endpoint = ***************
acl = bucket-owner-full-control
bucket_acl = private
upload_cutoff = 5Gi

[destination]
type = s3
provider = IBMCOS
access_key_id = ***************
secret_access_key = ***************
endpoint = ***************
acl = private

Any parameter I am missing that prevents transfers from starting immediately?

asdffdsa · December 3, 2021, 5:35pm

hello and welcome to the forum,

about the source, what is the name of the host provider?
the source provider might impose some sort of api throttling or other such limits..
as a test, might run this and see how long it takes and what is inside the rclone.log debug file
rclone ls source: --fast-list --log-level=DEBUG --log-file=rclone.log

nkemnitz · December 3, 2021, 6:42pm

Thanks for the quick response!

The entire log content with --log-level=DEBUG and --fast-list after 28 minutes:

2021/12/03 13:00:41 DEBUG : rclone: Version "v1.57.0-DEV" starting with parameters ["rclone" "copy" "--fast-list" "--s3-no-check-bucket" "--checksum" "--s3-no-head" "--s3-no-head-object" "--no-check-dest" "--no-traverse" "--transfers" "200" "--checkers" "200" "--retries" "1" "--progress" "source:/bucket/prefixC/" "destination:/bucket/prefixC/" "--low-level-retries" "3" "--timeout" "10s" "--contimeout" "10s" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/12/03 13:00:41 DEBUG : Creating backend with remote "source:/bucket/prefixC/"
2021/12/03 13:00:41 DEBUG : Using config file from "/usr/people/nkemnitz/.config/rclone/rclone.conf"
2021/12/03 13:00:41 DEBUG : source: detected overridden config - adding "{ECHgI}" suffix to name
2021/12/03 13:00:41 DEBUG : fs cache: renaming cache item "source:/bucket/prefixC/" to be canonical "source{ECHgI}:bucket/prefixC"
2021/12/03 13:00:41 DEBUG : Creating backend with remote "destination:/bucket/prefixC/"
2021/12/03 13:00:41 DEBUG : destination: detected overridden config - adding "{ECHgI}" suffix to name
2021/12/03 13:00:41 DEBUG : fs cache: renaming cache item "destination:/bucket/prefixC/" to be canonical "destination{ECHgI}:bucket/prefixC"

Source is Cloudian (on-premise). There are no artificially imposed limits on the source. Based on previous cosbench results, the source should be capable of ~10k GET op/s, the destination of ~4k PUT op/s using those tiny objects.

I can see that bandwidth increases by ~30 Mbit/s during this time. So something is happening... I don't see a difference with/without --fast-list, but just to be sure: --fast-list is attempting to fit all the 90 million object headers into memory before attempting to upload the first file, correct? And without it rclone should start the transfer after the first batch of (probably 1000) objects that the list operation returns?

asdffdsa · December 3, 2021, 7:06pm

ok, there are only so many reasons for the initial slow down, need to eliminate the basics reasons first.
is the issue the source, the dest or both?
so i would run a simple list command on the source and dest, such as rclone ls with and without --fast-list with a debug log and see what kind of delay there is.
to get a deeper look
--dump=headers --retries=1 --low-level-retries=1 --log-level=DEBUG --log-file=rclone.log
not sure about your use case, do you plan to run the rclone copy on a schedule or is this a one-time copy?
sometimes, to reduce the api calls, can use --max-age
you are using so many flags, not sure how they all interact.
i like to use the simplest command possible, using defaults and get a base line performance and add flags one at a time, test and repeat.

asdffdsa · December 3, 2021, 7:21pm

also, to cut down on the total test time, pick a subfolder with a smaller amount of files.

nkemnitz · December 3, 2021, 7:50pm

Ooh, thank you for the --dump=headers suggestion. I can see that the logs just contain the list requests / responses from the source https://pastebin.com/pcxURhWh, e.g.:

2021/12/03 14:12:24 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/12/03 14:12:24 DEBUG : HTTP REQUEST (req 0xc000281d00)
2021/12/03 14:12:24 DEBUG : GET /bucket?delimiter=&max-keys=1000&prefix=prefixC%2F HTTP/1.1
Host: ***************
User-Agent: rclone/v1.57.0-DEV
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211203T191224Z
Accept-Encoding: gzip
2021/12/03 14:12:24 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/12/03 14:12:24 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/12/03 14:12:24 DEBUG : HTTP RESPONSE (req 0xc000281d00)
2021/12/03 14:12:24 DEBUG : HTTP/1.1 200 OK
Content-Length: 334649
Content-Type: application/xml;charset=UTF-8
Date: Fri, 03 Dec 2021 19:12:24 GMT
Server: CloudianS3
Strict-Transport-Security: max-age=31536000
X-Amz-Bucket-Region: hpcrc
X-Amz-Request-Id: 40f5801c-65e2-1ec3-94de-d8c4974cee2e
X-Gmt-Policyid: f49711f8ff18c0ab7271c9685afa54e1

2021/12/03 14:12:24 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

So I guess I misunderstood rclone's default behavior and the purpose of --fast-list: I thought that is the one that controls whether or not rclone will first collect the complete list of all source objects before starting to transfer the first objects.

Is there maybe another flag I missed that does that? I.e. let rclone collect object names in the background (up to 1000 per list request) and transfer the first few objects from the queue to keep the destination busy, as well?

If not, I will split it into two steps:

A list operation to collect all object names - for 90 million objects and ~11 list operations per second, that's a bit over 2 hours.
The same copy command that I started with, but with --files-from to avoid having rclone list the source directory again (and again in case of errors)

asdffdsa · December 3, 2021, 8:06pm

based on the docs
https://rclone.org/docs/#check-first
"Normally rclone would start running transfers as soon as possible."
what is the total size of all the files?
the most i have ever tested is rclone sync from local file system to wasabi, s3 clone known for hot storage, with 1,000,000 files.
if the source and dest match, then rclone sync /path/to/local/folder dest: --tranfers=256 --transfers=256 takes approx 36 seconds.
not sure the exact optimized command.
--- as i suggested, take a subdir with a smaller amount of files, the simplest command possible, test, add flags and test until the optimized command is achieved.
--- tho for sure i would use --checksum
--- never used it but might try https://rclone.org/docs/#max-backlog-n

Ole · December 3, 2021, 10:29pm

Nico has a point, and this part of the documentation isn’t entirely true, perhaps it was written before the introduction of --fast-list.

Another part of the docs says:
https://rclone.org/docs/#fast-list

rclone normally lists a directory and processes it before using more directory lists to process any subdirectories. This can be parallelised and works very quickly using the least amount of memory.

This statement implicitly says that --fast-list will list multiple directories before processing and this listing will be performed by a single --checker. It will therefore take longer time before transfers starts.

--fast-list will therefore be slower if transfers take more time than the checking. It may be faster if there is little to transfer and most of the time is spent on checking – and the increased speed from the reduced number of requests is enough to compensate for the decreased speed caused by the loss of parallelization.

asdffdsa · December 3, 2021, 11:08pm

jojo half human half monkey, no understand.
ole, half human, half human, understand, what command for OP?

Ole · December 3, 2021, 11:17pm

Tough call far outside my area of expertise. I would probably try increasing --s3-list-chunk ???

But first I would select a smaller dataset (a subfolder or subprefix) and then perform a copy from source to local and another from local to dest: - just to find out in what end (source or destination) I may have a performance/data issue.

nkemnitz · December 4, 2021, 12:57am

Thanks for the suggestions. Some quick updates:

I am seeing the same behavior for a prefix with 50,719 objects: 7 seconds pause before the transfer starts
The pause happens because rclone collects all objects with the prefix - I see 51 list requests before the first object is downloaded
The destination does not play any role, I replaced it with the memory remote and see the same behavior
Removed all parameters that were meant to reduce number of requests/checks, so the entire call (without logging) now looks like rclone copy Cloudian:/bucket/prefixC/ Memory:/ --progress, but no change
Setting --max-backlog 1 has no effect, which is kind of surprising?
Setting --fast-list has no effect - it's a leaf directory, so nothing that can be easily parallelized by rclone
--s3-list-chunk by default already uses the maximum value (1000 objects per page). I also lowered it to 500, but it just causes rclone to send 102 list requests instead of the 51 requests

Ole · December 4, 2021, 10:39am

This makes good sense to me. rclone reads the entire directory content before processing the individual entries (comparing, moving into the backlog and transferring).

This is the best (and often the only) possibility in the typical rclone sync/copy situation. Your situation is special because you have a lot of small files in each folder and no need for checking/comparing to destination.

Still, you do get a very good throughput, you are scanning the folders at app. 10k files/s. This is an order of magnitude better than my OneDrive.

You may get better error resilience (smaller retries) with a series of smaller jobs at the expense of increased complexity. It is a balance as always.

You mentioned --files-from earlier. It didn’t scale well with many files in the scenarios I tried a while ago, so you may consider load testing it before moving in that direction.

nkemnitz · December 4, 2021, 4:20pm

So in summary: Rclone will first list the entire content of the source "leaf" directory before initiating the transfer. If this is the only directory that needs to be transferred and contains several million objects, there will be a delay equal to the time it takes for listing the entire source directory, which can accumulate to several hours for millions of objects. I created a feature request on Github.

For now, I ended up with:

rclone lsf --absolute Cloudian:/bucket/prefixC/ > list_of_object_names.txt - for 90M objects, the result is a 4 GiB text file
Split file into 90 files with 1M lines/objects each
For loop around rclone copy --checksum --s3-no-head --s3-no-head-object --no-traverse --no-check-dest --from-files-raw list_of_object_files_part00.txt for each of the 90 files

Like @Ole noted, trying to do it in a single operation did not work well, either. I stopped it when rclone was at 24 GB memory consumption and no transfer had started.

This way, I still have to list the entire source directory, but at least if something unexpected happens during the transfer and rclone or the node crashes, I can skip the 3 hour long list operation step.

Edit: Thanks @asdffdsa for the --absolute parameter - I added it to the steps.

asdffdsa · December 4, 2021, 4:34pm

--absolute Put a leading / in front of path names
"Note that the --absolute parameter is useful for making lists of files to pass to an rclone copy with the --files-from-raw flag."
i guess that this is a one-time copy operation, or not suitable to use --max-age

nkemnitz · December 4, 2021, 4:55pm

Yes, it's hopefully a one-time transfer, so unfortunately --max-age won't work in this case. I added your note about --absolute - that already reduces the manual processing a bit. Thanks!

system · December 7, 2021, 4:55pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.