Rclone S3 API calls

makksi · February 28, 2021, 3:57pm

As Amazon S3 pricing is a bit complicated and related not only to the size of the storage, but also to the API calls, is that possible during a sync or copy operations, just for test, list the API calls so we can count the overall number and:

realize if we can optimize some API calls using different flags for our transfer
do an estimate of the cost of each sync/copy operation

Thanks a lot

asdffdsa · February 28, 2021, 4:04pm

this can reduce the amount of api calls.
https://rclone.org/docs/#fast-list

makksi · February 28, 2021, 4:06pm

Yes, but my question was more general, I'm interested in whole reporting the API calls.
Moreover --fast list cannot be used on a low memory machines for large syncs

Thanks in any case

asdffdsa · February 28, 2021, 4:39pm

aws s3 website has many options to monitor and log api calls.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/logging-with-S3.html

so for testing:

create a bucket
enable logging
create a rclone remote for that bucket
run rclone
read/parse the logs.

makksi · February 28, 2021, 4:59pm

Thanks,
I was thinking to some flag in rclone to enable logging of API calls.Probably this could be very helpful.
What about pricing of the monitor service in AWS? I saw the log is registered to S3 so it has a cost and probably also the logging has a cost for entries. Did you have the chance to try this servise?

asdffdsa · February 28, 2021, 5:28pm

it has not been a concern of mine.

for most recent backup files, such as veeam backup files and for recent backups with lots of small files, they goto wasabi, s3 clone known for hot storage, does not charge for api calls.

for long term storage of backups, they goto aws s3 deep glacier using rclone copy --immutable.
if there is a folder with lots of small files, they are zipped, and then that single zip is copied.

about a flag, not sure the value of it, as aws offers detailed logging.
if that is something you want, start a new topic using the feature category, make the request and see how other rcloners respond.

ncw · February 28, 2021, 6:27pm

You can use -vv with --dump headers

This will show all the http transactions.

It is too verbose for normal use but you can investigate with it.

makksi · February 28, 2021, 10:12pm

Thanks a lot Nick. Using that command during a sync with no object to update (it is already synced before) I just see this headers every 1000 file list

2021/02/28 21:55:24 DEBUG : HTTP REQUEST (req 0x24e0700)
2021/02/28 21:55:24 DEBUG : GET /?delimiter=%2F&encoding-type=url&max-keys=1000&prefix=Album_2011%2FBeachVolley%2F HTTP/1.1
Host: myhost.s3.us-east-2.amazonaws.com
User-Agent: rclone/v1.53.3-DEV
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4f8996fb92asdfae41e4649b934ca49599437852b855
X-Amz-Date: 20210228T215524Z
Accept-Encoding: gzip

Yes this is really useful but I can't find a reference to the API call

ncw · February 28, 2021, 10:37pm

That is a GET on the bucket which is

makksi · February 28, 2021, 10:50pm

Yes Nick, you are right!

Sorry for my poor attention to te S3 documentation. Yes this is that API call and the requested http header is exactly what is described in the documentation. So the --dump-headers options is perfect for this debug purpose! Thanks a lot again.
I was worried when doing a sync that there was an API call for every object in the bucket. Now I know there is just one API call every 1000 object, so I can count the frequency of sync operations I can take
in a month considering that Amazon will charge 5cents every 1000 ListObjects API calls/month

Thanks a lot again

asdffdsa · February 28, 2021, 10:57pm

this is interesting as i use deep glacier, which has higher cost per api call.

Amazon S3
"The modification time is used by default for all operations that require checking the time a file was last updated. It allows rclone to treat the remote more like a true filesystem, but it is inefficient on S3 because it requires an extra API call to retrieve the metadata."

so for a rclone cryptcheck of 1000 files, there would be 1000 additional api calls.

@ncw will know for sure,

ncw · March 1, 2021, 9:54am

Using --fast-list with --size or --checksum should only do one API per 1000 objects.

If you aren't using --size or --checksum then you will get one HEAD request per object.

However rclone check and rclone cryptcheck don't read modification times and effectively work like a sync with --checksum enabled.

makksi · March 1, 2021, 1:17pm

Nick,
I'm just using --size-only without --fast-list and from my understanding looking at the HTTP request (I posted before) it should be only one API call every 1000 objects also without --fast-list. Is that true, or perhaps am I missing something?.

Thi is my rclone script:
/opt/bin/rclone sync -v /mnt/NAS/$1 s3:$2
--log-file /mnt/NAS/linux/rclone_s3_$1.log
--delete-excluded
--size-only
--s3-no-check-bucket
--s3-storage-class DEEP_ARCHIVE
--transfers 1
--s3-chunk-size 32M
--s3-upload-concurrency 1
--checkers 1
--use-mmap
--filter-from /jffs/myscripts/rclone_s3_filter.txt

ncw · March 1, 2021, 2:10pm

It is one API call per directory + 1 per additional 1000 entries in the directory if not using --fast-list.

If you have directories with 1000s of objects it won't make much difference, but for lots of directories with a small number of things --fast-list will be much better.

makksi · March 1, 2021, 3:10pm

Perfect Nick, I see it.

I realized looking at the log that in some cases there was one API call every 1000 entries, but in some (few in my cases because most of the directories contains lots more than 1000 entries) cases there were extra API calls. Now reading to your comment I see that really it happens on directory changes!

Now everything is clear and I got the advantage of using --fast-list flag

Thanks a lot again

system · March 4, 2021, 3:10pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.