[remote]
type = s3
provider = AWS
env_auth = true
region = <>
location_constraint = <same value as above>
acl = private
server_side_encryption = AES256
storage_class = STANDARD
Issue
We want to copy data from source path which ever is older than X months to a destination path. And delete from source path which ever is older than X months.
The 'rclone copy remote:src_path remote:dest_path --min-age 548d --transfers 1000 --checkers 2000 --max-backlog=10000000' -- takes 7 min BUT
The 'rclone delete remote:dest_path --min-age 548d --rmdirs --transfers 1000 --checkers 2000 --max-backlog=10000000 --progress' -- takes 12hr for deletion which is very high
The s3 src_path contains 34 TB data but min-age criteria was satisfied by only 6 TB.
If you have enough memory you could try --fast-list which might speed things up if you have a complicated directory structure.
Will try that. Other than taking more memory, is the result same with/without --fast-list?
If same, does rclone delete with --fast-list considers filters such as --min-age we are using
Are all the files in one directory?
No. File structure is nested. path is N directories and again each of them has X directories and again each of them have files. Depth cannot be predetermined.
In theory --checkers 2000 should be deleting 2000 files at once once.
Does not look like. From logs, Its deleting every 100 files at once.
rclone size remote:dest_path
Tried for a small bucket containing 95GB and 50k objects. It took 16mins
Hmm, I wonder if it is reading the modtime that is causing the problem. On S3 reading the modtime takes an extra transaction as it needs to HEAD the object to read the metadata.
Can you time
rclone size remote:dest_path
vs
rclone size remote:dest_path --min-age 548d
If that is the cause of the slowdown you can use --use-server-modtime which will use the time uploaded to S3 rather than the modification time of the object. This does not require a HEAD request.
So its the HEAD requests to read the modification time that is the problem.
Note that rclone copy is reading the source modification time - it doesn't have to read the destination modification time unless there is a matching file and I'm guessing that in the source you don't have all the files in the destination? (If so then you might try --no-traverse to speed this up too).
By default rclone will use the modification date of the file when it was uploaded. This needs an extra HEAD request on each object to read.
If you use --use-server-modtime then it will use "time uploaded to S3" which will be much faster as it doesn't have to HEAD each object to read it.
The problem here is rclone delete is using recursive listing by default and this essentially means that it does each HEAD request single threaded. You can disable this with --disable ListR.
According to my experiments with a normal directory structure (generated with rclone test makefiles --files 10000 --max-file-size 10b 10000files) the speed of listing with an age filter seems to max out at about --checkers 16 with --disable ListR.
What you are doing
$ time rclone delete --min-age 100d TestS3MinioManual:test --checkers 32
real 0m10.957s
user 0m4.254s
sys 0m0.760s
Now with --disable ListR
$ time rclone delete --min-age 100d TestS3MinioManual:test --checkers 32 --disable ListR
real 0m1.588s
user 0m5.489s
sys 0m0.732s
So I think you should be able to get a significant speedup by adding --disable ListR to your rclone delete.
You can also try this with just plain rclone delete (no --disable ListR)
This does the HEAD requests in parallel so is a lot faster. However it still only fetches one page of things from AWS at once so will top out at 1024 HEADs in parallell.
rclone delete with --disable ListR is much faster with transfers 1k, checkers 2k.
Completed in 16mins vs >17hours rclone delete (no --disable ListR )
Thanks a lot @ncw
I could not use the beta version, as its not allowed in a production setting, used 1.62.0 itself.
Please confirm that the end result is same with or without --disable ListR, I mean files that get deleted with or without --disable ListR will be same.