Rclone delete on S3 very slow

What is the problem you are having with rclone?

Rclone Delete is very slow

Run the command 'rclone version' and share the full output of the command.

1.62.2 on Linux AMD64

Which cloud storage system are you using? (eg Google Drive)

S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone delete remote:<path> --min-age 548d --rmdirs --transfers 1000 --checkers 2000 --max-backlog=10000000 --progress

The rclone config contents with secrets removed.

[remote]
type = s3
provider = AWS
env_auth = true
region = <>
location_constraint = <same value as above>
acl = private
server_side_encryption = AES256
storage_class = STANDARD

Issue

We want to copy data from source path which ever is older than X months to a destination path. And delete from source path which ever is older than X months.

The 'rclone copy remote:src_path remote:dest_path --min-age 548d --transfers 1000 --checkers 2000 --max-backlog=10000000' -- takes 7 min BUT
The 'rclone delete remote:dest_path --min-age 548d --rmdirs --transfers 1000 --checkers 2000 --max-backlog=10000000 --progress' -- takes 12hr for deletion which is very high
The s3 src_path contains 34 TB data but min-age criteria was satisfied by only 6 TB.

Please suggest how we can speed up delete.

How many files in total and how many are being deleted (roughly)? Its how many files which will be the limiting factor.

If you have enough memory you could try --fast-list which might speed things up if you have a complicated directory structure.

Are all the files in one directory?

In theory --checkers 2000 should be deleting 2000 files at once once.

It would be useful to see what is the limiting factor - is it traversing the the directory that is the slow part of deleting the files.

How long does this take (this is equivalent to the remove with --fast-list)

rclone size remote:dest_path

And how long does this take? (this is equivalent to the remove without --fast-list)

rclone size remote:dest_path --disable ListR

How many files in total and how many are being deleted (roughly)? Its how many files which will be the limiting factor.

Total: 1.9Million objects Getting deleted: 0.7Million objects

If you have enough memory you could try --fast-list which might speed things up if you have a complicated directory structure.

Will try that. Other than taking more memory, is the result same with/without --fast-list?
If same, does rclone delete with --fast-list considers filters such as --min-age we are using

Are all the files in one directory?

No. File structure is nested. path is N directories and again each of them has X directories and again each of them have files. Depth cannot be predetermined.

In theory --checkers 2000 should be deleting 2000 files at once once.

Does not look like. From logs, Its deleting every 100 files at once.

rclone size remote:dest_path

Tried for a small bucket containing 95GB and 50k objects. It took 16mins

rclone size remote:dest_path --disable ListR

Took 2 mins

For your directory structure, looks like --fast-list will be slower (as it can't fetch the directories in parallel).

It doesn't look like the directory traversal is not the cause of the slow deletions, it must just be the deletions themselves.

It might be 2,000 is triggering some kind of rate limit. Try a much smaller number, say 64 and double until the delete performance drops off.

It might be 2,000 is triggering some kind of rate limit. Try a much smaller number, say 64 and double until the delete performance drops off.

That's strange. While copy also scans all files, its faster and delete is slower.

'rclone copy remote:src_path remote:dest_path --min-age 548d --transfers 1000 --checkers 2000 --max-backlog=10000000' -- takes 7 min
The 'rclone delete remote:src_path --min-age 548d --rmdirs --transfers 1000 --checkers 2000 --max-backlog=10000000 --progress' -- takes 12hr

Is the configurations correct on combination of transfers, checkers, max-backlog for delete

Hmm, I wonder if it is reading the modtime that is causing the problem. On S3 reading the modtime takes an extra transaction as it needs to HEAD the object to read the metadata.

Can you time

rclone size remote:dest_path

vs

rclone size remote:dest_path --min-age 548d

If that is the cause of the slowdown you can use --use-server-modtime which will use the time uploaded to S3 rather than the modification time of the object. This does not require a HEAD request.

rclone size remote:dest_path --min-age 548d --use-server-modtime

rclone size with min age filter is taking 6 mins vs without is taking 4sec for 5TB data.

That's the cause of slowness. Again it get's me thinking rclone copy with filter is fast. Shouldn't it be same performance for cp and delete ?

I would like to filter based on last modification time of the object. Is rclone cp using 'time uploaded to S3' or 'modification time of the object' ?

So its the HEAD requests to read the modification time that is the problem.

Note that rclone copy is reading the source modification time - it doesn't have to read the destination modification time unless there is a matching file and I'm guessing that in the source you don't have all the files in the destination? (If so then you might try --no-traverse to speed this up too).

By default rclone will use the modification date of the file when it was uploaded. This needs an extra HEAD request on each object to read.

If you use --use-server-modtime then it will use "time uploaded to S3" which will be much faster as it doesn't have to HEAD each object to read it.

If I say I want to proceed using last modification time of the object with a extra HEAD request. How can I speed up the process?

Giving transfers/checkers as 20K is also not speeding up things. CPU Utilisation is hardly 1%.

The problem here is rclone delete is using recursive listing by default and this essentially means that it does each HEAD request single threaded. You can disable this with --disable ListR.

According to my experiments with a normal directory structure (generated with rclone test makefiles --files 10000 --max-file-size 10b 10000files) the speed of listing with an age filter seems to max out at about --checkers 16 with --disable ListR.

What you are doing

$ time rclone delete --min-age 100d TestS3MinioManual:test --checkers 32

real	0m10.957s
user	0m4.254s
sys	0m0.760s

Now with --disable ListR

$ time rclone delete --min-age 100d TestS3MinioManual:test --checkers 32 --disable ListR

real	0m1.588s
user	0m5.489s
sys	0m0.732s

So I think you should be able to get a significant speedup by adding --disable ListR to your rclone delete.

You can also try this with just plain rclone delete (no --disable ListR)

v1.63.0-beta.6959.b87a9c835.fix-listr-performance on branch fix-listr-performance (uploaded in 15-30 mins)

This does the HEAD requests in parallel so is a lot faster. However it still only fetches one page of things from AWS at once so will top out at 1024 HEADs in parallell.

rclone delete with --disable ListR is much faster with transfers 1k, checkers 2k.
Completed in 16mins vs >17hours rclone delete (no --disable ListR )
Thanks a lot @ncw

I could not use the beta version, as its not allowed in a production setting, used 1.62.0 itself.

Please confirm that the end result is same with or without --disable ListR, I mean files that get deleted with or without --disable ListR will be same.

Great

OK

Yes you will get the same result.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.