One HEAD request is to see whether the file already exists (because of the --no-traverse) and one is to confirm it was uploaded properly.
First make sure you have v1.48 - the latest release.
Depending on exactly how your files are laid out, removing --no-traverse may reduce the number of queries. If all the files you are uploading are in one directory for instance then that will definitely be a win. You'll need to use this in conjunction with --size-only or --checksum for syncing. This will avoid rclone doing a HEAD on the file to read the metadata.
Why are you using --ignore-times? That is probably causing rclone to transfer stuff it doesn't need to.
Also --track-renames only works with rclone sync so I suggest you remove that.
One HEAD request is to see whether the file already exists (because of the --no-traverse)
Is it possible to avoid this request and just copy the file (for new and modified files) to the bucket without checking the existence?
The script that runs monthly is: /usr/bin/rclone sync "/volume1/myfolder" "AmazonS3DeepGlacier:mybucket/myfolder" --fast-list --exclude "#recycle/**" --exclude "@eaDir/**" -v --config="/var/services/homes/admin/.config/rclone/rclone.conf" --checksum --track-renames
One folder I backup to Amazon S3 bucket has 1224 files and 124 directories. (388 MB)
The other folder has 238508 files and 52433 directories. (40 GB)
Everyday only a few files (let us say 10 files) are changed or added.
Because Amazon charges for requests I would like to limit the requests as much as possible.
If I have for example 5 new files I would like to see only 5 PUT requests. Maybe also 5 HEAD requests to confirm if the files are properly uploaded. But in my case these 5 HEAD requests are not necessary because I would check the whole folder with the monthly script. So is it possible to avoid all the HEAD requests?
I've tested it with removing --no-traverse and adding --checksum: /usr/bin/rclone copy "/volume1/myfolder" "AmazonS3DeepGlacier:mybucket/myfolder" --max-age 24h --checksum --exclude "#recycle/**" --exclude "@eaDir/**" -v --config="/var/services/homes/admin/.config/rclone/rclone.conf
but this is definitively not giving the right results. I've added 1 file which needed to be copy to the S3 bucket for my smallest folder (1224 files and 124 directories. (388 MB)) but now I have a total of 767 requests:
Version of rclone:
1 file was less than 24h old (I've just added 1 new file).
But when reading the following link https://forum.rclone.org/t/no-traverse-for-dummies/2992/2 apparently I've to use the option --no-traverse because if you are not using this option rclone will load in the definitions for all the remote files before discovering whether the newly locally added file needs to be uploaded. If you use --no-traverse rclone will just check the newly added file on the remote.
I've been doing some more investigation into this. I've discovered the problem. It is that rclone is doing the age filtering on the source and the destination, so each file considered is using a HEAD request.
My workaround above with --files-from should work to fix the issue.
Another way around this would be to use --use-server-modtime
--use-server-modtime Use server modified time instead of object metadata
Can you please make a new issue on github about this and put a link to this page in. I think this needs fixing properly at some point - I don't think the destination file list should be being filtered by age at all though there are a rather a lot of things to consider there!
Because I'm using the option --max-age option it is not necessary to check if the file is already there remotely. The script should just upload it in my opinion. This will save this extra HEAD request.
To ignore the destination completely would mean that repeating the command would make the upload again which is not desirable in the general case.
In the case when you are first generating a list of modified/newly files with the first command in the script it is completely unncessary to check if the file exists on Amazon or to check if the file is modified because the first command in the scripts gives us just the files which are new and modified.