Avoiding AWS API costs by using both `--size-only` and `--update --use-server-modtime`?

234u5h3l4kj5h4k5b · June 19, 2023, 7:43pm

So API calls are quite expensive for AWS, and these costs ramp up significantly when using rclone. This issue has been discussed multiple times on these forums. The recommendation has been to use --size-only, --checksum, or --update --use-server-modtime (along with --fast-list) to avoid doing one API call per object. As the guide (at Amazon S3) discusses, there have trade-offs.

My question is: what if I did two identical rclone sync commands to AWS, but the first with --size-only --fast-list and the second with --update --use-server-modtime --fast-list. Are there any downsides to doing this? The first run should capture (1) all new files and (2) existing files with a different local file size than remote file size, but it will miss existing files whose local file size is the same as the remote file even after being modified. The second run will capture any files missed by the first run because the remote server modtime on the missed files will be different than the file modified time on the local file.

Right? Is there a risk of double uploading changed files? I do not think so, but please correct me if I am wrong.

The only downside I can think of is now you are using 2x as many API calls to AWS (so, effectively 2 API calls per 1,000 objects rather than 1 per 1,000 objects).

Because I know someone will ask: I do not want to use --checksum because that unnecessarily uses local SSDs to run a checksum on thousands of objects... which can be painful since I am running rclone every 15 mins.

jwink3101 · June 19, 2023, 10:49pm

This is only true if the modification is after the last upload time. Safe but by no means bullet-proof assumption.

However, I am not sure I see the point in this. If two files have different sizes, regardless of ModTime, they will get updated. Maybe I am not understanding but I think you can skip the first one. Or skip the second and not catch mods that don't change size (not very common).

Totally fair. Why not wrap local with hasher? It's not perfect but may be good enough for your uses.

asdffdsa · June 19, 2023, 10:54pm

wasabi, and some other s3 providers do not charge for api calls.

234u5h3l4kj5h4k5b · June 19, 2023, 11:02pm

You are correct that the two-step process I am proposing would miss files that have a modification time after the last upload, but it would have to be files that have also not changed in size. So this is slightly more bullet proof than either of the two commands by themselves.

Thanks for the hasher tip. I am a bit hesitant to add more complexity to the process, but I am also unsure how hasher would work with crypt.

234u5h3l4kj5h4k5b · June 19, 2023, 11:03pm

I do use wasabi as well.

234u5h3l4kj5h4k5b · June 19, 2023, 11:19pm

Is there a risk of double uploading changed files? I do not think so, but please correct me if I am wrong.

For anyone wondering, I tested it and there does not seem to be a risk of double uploading if --size-only is run first and then --update --use-server-modtime

jwink3101 · June 19, 2023, 11:54pm

I am 99.9% sure it doesn’t matter. The update will also include changed size regardless of modification size. I’m almost certain!

234u5h3l4kj5h4k5b · June 20, 2023, 5:41am

Using --size-only will miss files that do not change file size:

--size-only

Only checks the size of files.

Uses no extra transactions.

If the file doesn't change size then rclone won't detect it has changed.

rclone sync --size-only /path/to/source s3:bucket

jwink3101 · June 20, 2023, 1:01pm

Yes, but the update call will also catch changes in size