Rclone copy dry-run of big S3 bucket is taking super long time to start

https://beta.rclone.org/v1.64.0-beta.7128.22a14a8c9/

1 Like

After update at least initially it's doing about 15 requests per 10 sec...
But should it be faster with or without --fast-list, if you have the memory available?

with --fast-list + fix applies at the later stage when indexing listed data - so it should not make any difference until listing is finished.

Now every request I think contains like 1000 items so it should finish listing in few hours - then I guess ncw wants to see the last 10 headers

You should run:

rclone copy hcp:bucket1/RES/tblIMAGE/ scality:spring-bucket/RES/tblIMAGE/ -vv --checksum --s3-versions --progress --create-empty-src-dirs --log-file=tblIMAGE_migration_2023.06.15.13.20.txt --dry-run --fast-list --dump headers

1 Like

okay, running it pretty vanilla now, without --fast-list and --files-only, in a screen.
Seems to be getting about 1 request per second (with a 1000 items per request).
So I'll see if/when it finishes and I'll post the output.

Thanks for all help!

Great.

It sounds like the speed is limited by the source to 3-5 requests per second. Certainly rclone can go much faster.

So this should take between 2.4 and 4.1 hours if it maintains the same speed for 44.1 million objects.

If rclone gets stuck in a loop what you will see is rclone fetching the same GET request over and over again. Here is an example from your log above. Note how the URLs are different - the marker parameter is different - so rclone is not stuck here.

Ah, thanks for the clarification.
With the command from above from kapitainsky, it seems to be doing way better!

Counted it out now to about 109 Requests per 10 sec. So that's a maaajor improvement.
Funny how lsf and actual copy differs that much, but maybe that's me not understanding the underlying mechanics.

1 Like

Let us know how it goes!

when it is running now

BTW how much memory this VM has? 45m objects will require almost 5GB (1KB per object). If it starts swapping things will slow down enormously - so you should also watch RAM utilisation.

Kept getting OutOfMemory issues in the beginning, mso I upped it quite a lot to 4 cpu's and 16gb ram.
So in my mind that should be sufficient.

I see now that performance has dropped to that 1 request/sec again...
I'll have to investigate to see if I can find any resource bottlenecks

maybe problem is not VM and rclone but S3 source? and it can not sustain long lasting listing?

Yeah, starting to agree with you that the problem is external to both VM and rclone...
Now I'm down to a request per 3 seconds...

If I start another copy it goes in the same speed as initially, so something degrades over time.
The process on the machine isn't heavy, so it's not that...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
98305 martins 20 0 6409592 5.2g 17268 S 1.3 33.9 11:57.23 rclone

e.g.

2023/07/05 09:50:24 DEBUG : HTTP REQUEST (req 0xc0008f7400)
...
...
2023/07/05 09:50:25 DEBUG : HTTP RESPONSE (req 0xc0008f7400)

if time between HTTP REQUEST and HTTP RESPONSE is 3s it is clearly S3 server. rclone sends request and waits for response...

if there is time gap between HTTP RESPONSE and subsequent HTTP REQUEST than it would be rclone problem

Looks like it to me.... Maybe I need to contact that support and get that checked out!

2023/07/05 13:31:33 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2023/07/05 13:31:33 DEBUG : HTTP REQUEST (req 0xc060fded00)
2023/07/05 13:31:33 DEBUG : GET /bucket1?delimiter=&key-marker=RES%2FtblIMAGE%2F15975853&max-keys=1000&prefix=RES%2FtblIMAGE%2F&version-id-marker=101269065008193&versions= HTTP/1.1
Host: my.s3.endpoint.com
User-Agent: rclone/v1.64.0-beta.7128.22a14a8c9
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20230705T113133Z
Accept-Encoding: gzip

2023/07/05 13:31:33 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2023/07/05 13:31:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2023/07/05 13:31:36 DEBUG : HTTP RESPONSE (req 0xc060fded00)
2023/07/05 13:31:36 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Cache-Control: no-cache,no-store,must-revalidate
Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline'; connect-src 'self'; img-src 'self'; style-src 'self' 'unsafe-inline'; object-src 'self'; frame-ancestors 'self';
Content-Type: application/xml;charset=utf-8
Date: Wed, 05 Jul 2023 11:31:33 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Vary: Accept-Encoding, User-Agent
X-Content-Type-Options: nosniff
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block

2023/07/05 13:31:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

yeah...

2023/07/05 13:31:33 DEBUG : HTTP REQUEST (req 0xc060fded00)
2023/07/05 13:31:36 DEBUG : HTTP RESPONSE (req 0xc060fded00)

rclone has to wait 3s for server response

Thanks for debugging help!

and next HTTP REQUEST is immediate? or also delayed?

Looks pretty immediately after

2023/07/05 13:58:18 DEBUG : HTTP REQUEST (req 0xc0c0564800)
2023/07/05 13:58:23 DEBUG : HTTP RESPONSE (req 0xc0c0564800)

2023/07/05 13:58:23 DEBUG : HTTP REQUEST (req 0xc0c0565100)
2023/07/05 13:58:27 DEBUG : HTTP RESPONSE (req 0xc0c0565100)

So looks mostly like the response

IMO this is very clear evidence that some "throttling" happens on S3 server.