Special handling for S3 Glacier?

Looking at https://aws.amazon.com/s3/pricing/ the most expensive price component for Glacier is "PUT requests to Glacier $0.05 per 1,000 requests".

I'm backing up 2.5 million small files, so it would cost me $125 for the initial upload. Is it possible for rclone to automatically to group multiple files and upload them in chunks to minimize the number of requests? Is there some other strategy you advocate?

Thanks,
Gili

I don't think s3 supports any sort of batch upload, so if you want fewer PUT requests you'll have to batch the files yourself.

You could do a tar of everything, back that up, keep a note of the date, and back up incrementals with a --max-age flag (annoyingly rclone doesn't have an absolute max-age only a relative one).

What sort of files are you backing up? Do you want to be able to access them individually on the cloud storage? If not what you could do is use restic which will bundle small files together. It can use rclone as a backend which is how I use it.

For example my laptop has 1,323,707 files in 168.1 GiB which restic backs up as 38,759 files.

I think this almost gets me there.

I'd create incremental snapshots of my data using restic and upload them to S3 every night. There is one problem however:

  • I backup to an external hard-drive once a month using rclone. This drive is physically disconnected the rest of the time to protect against ransomware attacks.
  • I don't have the space to backup locally to any other drive.
  • I want to backup to S3 nightly using restic but it doesn't support Glacier natively (https://github.com/restic/restic/issues/541) so you're forced to backup locally prior to each upload.

It's a bit of a catch-22. If my external drive was connected 24/7, I would just backup restic snapshots to it nightly, and then use rclone to upload the changes to S3 Glacier but that opens the door to ransomware attacks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.