Keeping the last n versions in wasabi buckets

ineedspeed · February 13, 2021, 8:53pm

I'm managing my corporate backups with wasabi, a s3 clone cloud provider that helps me in backupping files of small size without file creation throttling as gdrive does.

As it's a backup, my ultimate goal is to keep a consistent snapshot of what is in a given moment in my local systems for a month (and I'm backing up every day).

Looking at wasabi's KB here, they mention a non-practical solution of using the aws CLI to remove old versions of a file, but the thing doesn't seem to scale; here wasabi suggests to use the compliance mode to retain the oldest version of a file combined with retention, which should delete the file as soon it doesn't get updated after n days.

How can I keep the latest (available) versions, regardless their last update date on wasabi?

asdffdsa · February 13, 2021, 9:18pm

what is the total size of the files and how many files are there?

i have a script that uses rclone, 7zip, fastcopy and veeam and VSS snapshots.
it uses a combination of expensive wasabi s3 and cheap aws s3 deep glacier.

with rclone, this is an example of a full backup and a forever forward incremental, which i manually prune as needed.
the command that the script would generate looks something like this, here is simplified version of the actual command.

each time, the script will use the current date/time for the archive sub folder.
so each time the commad is run, there is a new archive sub folder.

rclone sync b:\veeam.br.en07\ wasabi01:veeam.br.en07/backup --backup-dir=wasabi01:veeam.br.en07/archive/20200909.151333

ineedspeed · February 13, 2021, 9:39pm

what is the total size of the files and how many files are there?

300GB+, 500k+ files, from different sources across the network (local, SMB shares, NAS).

I think I see what you did here (in a very elegant way): basically, you have neither versioning nor retain/compliance enabled on wasabi, but you push all the previous versions into folders with a timestamp. If this is the rationale, why are you using in addition fastcopy, veeam and VSS snapshots? There's something important that I'm missing in your solution?

asdffdsa · February 13, 2021, 9:51pm

that is the rationale.

well, if you are copying files, they could be in use, so the script enables VSS.
for example, on the local backup server there is a folder with veeam backup files.
so if rclone is uploading a large veeam backup file over a slow internet connection, and during that upload, a scheduled veeam backup starts, veeam might modify or add/remove files that rclone is trying to upload.
that is recipe of all sorts of problems and unintended consequences.

about fastcopy and 7zip
if i need to backup files from one computer to the backup server, on the same lan or vpn, i use fastcopy.
if i want to archive a set of files stored on that backup server, the script will 7zip them and rclone that zip file to cheap aws s3 deep glacier.

jwink3101 · February 13, 2021, 9:56pm

Don't get me wrong. I love rclone and I think it is a great tool for many things, including some forms of backup.

But what you're asking is really not where rclone shines. You would likely be way better served by a backup-first program. Especially one that uses content-defined blocks to efficiently backup files and keep snapshots by keeping a database of those blocks.

Restic is a popular one and also can use rclone to interface with more providers (though it natively supports S3). Duplicacy is generally well-regarded but not free (except the CLI for personal use).

It may be worth keeping them in mind.

Also, while it has its downsides too (notably you pay for egress), B2 is both slightly cheaper than Wasabi for storage alone and doesn't have any minimum time requirements. I am big fan! (The major downside is that you pay to download files). It can do S3 API or supports its own custom one too.

asdffdsa · February 13, 2021, 10:34pm

that is why i use veeam backup and replication, bare metal backups, file backups, sql server backups and so much more.
i use a combination of paid version and free versions at the offices i manage.
and use the free version for my home server.
all backup servers use the awesome, free microsoft server 2019 hyper-v edition with REFS.

as for cost, the combination of wasabi, for recent backup data, and aws s3 deep glacier, cannot be beat....

ineedspeed · February 13, 2021, 10:55pm

I'm think you might be right; I was sticking to this tool - which I agree, it's awesome - but might not be the best solution for my problem. As @asdffdsa pointed out, there are possible workarounds to achieve something similar to an incremental backup, although the proposed configuration seems to fit a different scenario than mine.

I'll definitely give a shot to VEEAM, given the Windows environment, and the feeling of a better customization of the backup structure.

Thanks you both for your precious suggestions and time!