Rclone and AWS Glacier Deep Archive request costs

I am considering using AWS Glacier Deep Archive to back up my NAS (n*10k files eventually) and I am curious about how rclone/AWS handles LIST requests to determine what needs to be uploaded. I see it's $0.05 per 1000 PUT, COPY, POST, LIST requests which might get expensive pretty quickly if I'm trying to sync every day (lot of LIST requests).

Do the LIST requests just list the contents of a directory? Could I lower the number of LIST requests by reducing the total number of directories I have? I know I'll have to bit the bullet and pay for a PUT request for each file I upload but that isn't really a concern of mine since it'll only happen once per file. My main concern is the LIST requests or GET requests that'll be happening. Would love to understand how many times those occur for each file/directory.

Thanks

This would give you a few pointers and general idea. Less directories would be better as a general rule as the more you have, the more times you would have to list them out.

So it sounds like there's no real way to know exactly what calls will be made (and thus how many). So I should use --fast-list and --checksum to try to minimize the number of requests. I understand how --fast-list can minimize the number, but how does --checksum lower the number of required requests? Does --fast-list come back with the checksum included for each file or something so we don't have to go and make a separate request for modification time?

Thanks

Sure, the code can be stepped through and figured out exactly as it's not random by any means. If you want to share the layout and file number structure, I'm sure someone can help calculate it.

I'm not 100% sure on this so hopefully someone else can chime in.

That is exactly right. Modification time is stored as metadata on the object which doesn't come back in the object listing. You can also use --size-only or you could use the --update flag along with --use-server-modtime in place of --checksum as --checksum will md5sum your local objects which can take quite a long time.

Another way of reducing the requests is to do a partial sync so

rclone copy --max-age 24h --no-traverse /path/to/local s3:bucket

This will make rclone find any files newer than 24h and copy those to the bucket without listing the remote bucket.

You could run this a couple of times a day then once a week you could run the full rclone sync which will delete any files that have been deleted from the archive.

You might like to check out this blog: https://noellh.com/blog/rclone-to-s3-glacier/ which I found which has some costs in.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.