Reduce API Requests for Sync

I’m using rclone to backup a folder with lots of small files to S3 every 30 minutes. I’m paying ~100x for requests (primarily ListBucket and HeadObject) as I am for the actual storage. What’s the best way to reduce those requests?

One thought I had was to do a selective copy of files modified in the last X minutes, then follow-up with a full-sync daily to catch any files that were deleted. Are there better options?

–fast-list if you have the memory to store the listing will reduce the number ListBucket requests enormously.

–checksum or –size-only will stop rclone doing the HeadObject to read the modified time.

I should probably write this on the S3 docs page…

1 Like

Thanks for the response @ncw! Is it possible to use the same technique the awscli does for syncing?

A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.

The ListBucket API provides all this info without using extra HeadObject.

Checksums and size-only are suboptimal for my use case (too slow and possibly inaccurate).

Interesting idea… You can’t set the modification time of an object once uploaded which is why rclone uses metadata to store its own last modified time. This is so the time can be preserved when you restore the file and you can sync from multiple places.

However if all you are interested in is preserving the data the awscli method would work quite well.

Rclone has part of this already

-u, --update                              Skip files that are newer on the destination.

But that will read the modification time from the metadata.

If rclone had a flag - say --s3-use-server-modtime or something like that then you could use that with -u to get the effect you wanted. The flag would then use LastModified from the ListBucket API.

What do you think - would that be useful? If you think so then please make a new issue on github . Some ideas for what the flag should be called as well would be useful!

It could possibly be a global flag say --use-server-modtime as this could apply to several backends s3 and swift are two examples which come to mind immediately.