Why so many requests?

aadiver · August 5, 2019, 6:21am

With the following script

/usr/bin/rclone lsf -R --files-only "/volume1/Data"  --max-age 24h --exclude "#recycle/**" --exclude "@eaDir/**" -v  > file-list
/usr/bin/rclone copy "/volume1/Data" "AmazonS3DeepGlacier:firstlookbackupsynology/Data" --files-from file-list -v --config="/var/services/homes/adminalef/.config/rclone/rclone.conf" --log-file="/volume1/homes/adminalef/rcloneLogs/`date +%Y%m%d_%H%M%S`BackupToAmazonS3DeepGlacier_Data.log"

I taught I would limit the number of requests as much as possible because the first statement gathers only the modified files during the last 24 hours.
Even when I have no modified files I can see on Cloudwatch from Amazon so many requests:

AllRequests : 240
ListRequests : 239
HeadRequests : 1

On Amazon the folder data contains 238508 files.

If you use the option --fast-list with rclone sync you get 1 request per 1000 files.
So this would mean 239 requests for the 238508 files. Is the following by coincidence that rclone copy is doing the same as rclone sync?

When I have no modified files I would expect with the script I have I should have 0 requests.

thestigma · August 5, 2019, 11:43am

Firstly, simply listing in itself uses requests, so that's why it's not 0,

When you list from the root of whatever folder then rclone will have to map everything below it. The filtering you apply happens in rclone locally. Thus it won't really help you save on requests - it will only affect whatever actions you apply to the filtered files.

--fast-list (for backends that support it - not all do) will package many list operations into one request and thus (potentially) save greatly on API calls. From the best of my understanding, without --fast-list you do one list pr folder. With fast-list you do package all (or most of) the requests for all folders rclone currently knows about into one. So for 10 folders on the same level it would be 1 request instead of 10. However rclone will naturally still need to get back those first lists before it can issue lists for any sub-folders they contain - as otherwise it won't know to ask them yet.

So for that many files, I wouldn't think that is necessarily an unnaturally high number of requests. Especially if there are a lot of subfolders in subfolders in subfolders ect. (which you probably need to keep that many files organized).

EDIT: I don't see that Amazon s3 backend has a --list-chunk option. My Grive does, but Google api doesn't have quite that spesific statistics to look at the exact number of requests. If you know a way to count the requests in rclone then I would do a comparison test.

What I don't know (and would like to have clarified) is if you can optimize this a little more by using a larger --list-chunk (for backends that provide this option). I don't think this can fix the issue of rclone needing multiple requests to handle separate levels of the folder structure, but in theory I could see it helping to save some requests in very large folders that otherwise get chunked.

Disclaimer: This info is only according the best of my understanding, so take it with a grain of salt. I'm not an expert on cloud storage APIs and there may be mechanisms at work which I am not aware of.

ncw · August 5, 2019, 6:01pm

rclone will be listing directories to find files to see if they need to be transferred.

If you add --no-traverse then rclone will do a HEAD request to see if the file is there which may be more efficient.

How many files were in the list?

And how many directories?

I think you'll get 1 a list of the root directory.

ncw · August 5, 2019, 6:01pm

rclone already uses the maximum allowed in a listing which is 1,000

thestigma · August 5, 2019, 8:35pm

Thanks. I wasn't sure if setting it to 0 ("disable" according to documentation) would actually remove that chunking altogether. After testing I'm still not really sure what that did - but it was significantly slower anyway so.

aadiver · August 6, 2019, 6:55am

When using the option --no-traverse the nbr of requests drops to 1 HEAD request. (when there is nothing to upload). Super. Thanks!
Just for curiosity : why still 1 request because if file-list is empty there is nothing to do?

/usr/bin/rclone lsf -R --files-only "/volume1/Data"  --max-age 24h --exclude "#recycle/**" --exclude "@eaDir/**" -v  > file-list
/usr/bin/rclone copy "/volume1/Data" "AmazonS3DeepGlacier:firstlookbackupsynology/Data" --files-from file-list --no-traverse -v --config="/var/services/homes/adminalef/.config/rclone/rclone.conf" --log-file="/volume1/homes/adminalef/rcloneLogs/`date +%Y%m%d_%H%M%S`BackupToAmazonS3DeepGlacier_Data.log"

So the nbr of subfolders don't really matter as "thestigma" is mentioning?

I have 40.74 GB of data.
238508 files.
52433 folders.

ncw · August 6, 2019, 10:56am

Great!

It is just the way the filtering works... It could be optimized away but I haven't felt the need!

So the listing lists the root directory unconditionally, then filters eveything out from that, then decides it doesn't need to list further.

system · November 4, 2019, 10:56am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.