One HEAD request is to see whether the file already exists (because of the --no-traverse) and one is to confirm it was uploaded properly.
First make sure you have v1.48 - the latest release.
Depending on exactly how your files are laid out, removing --no-traverse may reduce the number of queries. If all the files you are uploading are in one directory for instance then that will definitely be a win. You'll need to use this in conjunction with --size-only or --checksum for syncing. This will avoid rclone doing a HEAD on the file to read the metadata.
Why are you using --ignore-times? That is probably causing rclone to transfer stuff it doesn't need to.
Also --track-renames only works with rclone sync so I suggest you remove that.
When you do the full sync, if you have enough memory then --fast-list will make it run a lot quicker. You want to use --size-only, --checksum or --update --use-server-modtime with that.
One HEAD request is to see whether the file already exists (because of the --no-traverse)
Is it possible to avoid this request and just copy the file (for new and modified files) to the bucket without checking the existence?
The script that runs monthly is: /usr/bin/rclone sync "/volume1/myfolder" "AmazonS3DeepGlacier:mybucket/myfolder" --fast-list --exclude "#recycle/**" --exclude "@eaDir/**" -v --config="/var/services/homes/admin/.config/rclone/rclone.conf" --checksum --track-renames
One folder I backup to Amazon S3 bucket has 1224 files and 124 directories. (388 MB)
The other folder has 238508 files and 52433 directories. (40 GB)
Everyday only a few files (let us say 10 files) are changed or added.
Because Amazon charges for requests I would like to limit the requests as much as possible.
If I have for example 5 new files I would like to see only 5 PUT requests. Maybe also 5 HEAD requests to confirm if the files are properly uploaded. But in my case these 5 HEAD requests are not necessary because I would check the whole folder with the monthly script. So is it possible to avoid all the HEAD requests?
Rclone reads the metadata for the file to see if it needs to copy it. The file might be there already in which case much better one HEAD request than an unecessary upload.
Depending on how often you run the script this might be a situation you are facing.
As I said above removing --no-traverse and using --size-only or --checksum might use less requests as rclone will list a small number of directories and do no HEAD requests.
I've tested it with removing --no-traverse and adding --checksum: /usr/bin/rclone copy "/volume1/myfolder" "AmazonS3DeepGlacier:mybucket/myfolder" --max-age 24h --checksum --exclude "#recycle/**" --exclude "@eaDir/**" -v --config="/var/services/homes/admin/.config/rclone/rclone.conf
but this is definitively not giving the right results. I've added 1 file which needed to be copy to the S3 bucket for my smallest folder (1224 files and 124 directories. (388 MB)) but now I have a total of 767 requests:
694 HeadRequests
72 ListRequests
1 PutRequests
Version of rclone:
rclone v1.47.0-098-gac4c8d8d-beta
1 file was less than 24h old (I've just added 1 new file).
But when reading the following link https://forum.rclone.org/t/no-traverse-for-dummies/2992/2 apparently I've to use the option --no-traverse because if you are not using this option rclone will load in the definitions for all the remote files before discovering whether the newly locally added file needs to be uploaded. If you use --no-traverse rclone will just check the newly added file on the remote.
I've been doing some more investigation into this. I've discovered the problem. It is that rclone is doing the age filtering on the source and the destination, so each file considered is using a HEAD request.
My workaround above with --files-from should work to fix the issue.
Another way around this would be to use --use-server-modtime
--use-server-modtime Use server modified time instead of object metadata
Can you please make a new issue on github about this and put a link to this page in. I think this needs fixing properly at some point - I don't think the destination file list should be being filtered by age at all though there are a rather a lot of things to consider there!
and now I have the minimum number of requests. (only 1 HEAD request per PUT request and not anymore 2 HEAD requests as before)
I had 9 new files added (in a folder of 124 sub directories and 1249 files) and with Cloudwatch on Amazon I can see I have 20 requests.
1 LIST request for the bucket.
1 HEAD request for the folder myfolder
18 = (9 PUT + 9 HEAD) requests for the 9 files I've added.
These 9 HEAD requests are these requests to see if the files are OK or are these leading ones before UPLOAD?
Is there a possibility to avoid also these 9 HEAD requests?
Because I'm using the option --max-age option it is not necessary to check if the file is already there remotely. The script should just upload it in my opinion. This will save this extra HEAD request.
I can't think of any at the moment. To ignore the destination completely would mean that repeating the command would make the upload again which is not desirable in the general case.
but these generates even more HEAD requests. It is generating 2 HEAD requests per 1 PUT request.
So if 1 file is changed I get in total 5 requests on Amazon:
1PUT requests+ 2 HEAD requests for the file MyFile.xlsm
1 of these HEAD requests is looking for the MyFile.xlsm at the root of the bucket, strange it is looking over there ...
1 HEAD request for the bucket (with the option –no-traverse it is a HEAD request for the bucket in place of a LIST request if you are not using the option --no-traverse)
1 HEAD request for the folder
These are the following requests at Amazon for 1 modified file:
To ignore the destination completely would mean that repeating the command would make the upload again which is not desirable in the general case.
In the case when you are first generating a list of modified/newly files with the first command in the script it is completely unncessary to check if the file exists on Amazon or to check if the file is modified because the first command in the scripts gives us just the files which are new and modified.