Comments on incremental backup strategy

I plan on having the following very simple incremental backup strategy. It is so simple I wonder if there are major drawbacks I am overlooking. The idea is this:

rclone sync a:project_dir remote:project_dir/current —backup-dir remote:project_dir/$DATE
Rclone purge more than year old folders

This means having full files stored in the archive, which is not a big deal in my case (eg no big video editing being done). But it seems very simple and easy to maintain and restore.

Any drawbacks or problems I might encounter (data loss)?

Thanks for any input.

hello and welcome to the forum,

that is a common strategy many at the forum use including myself.
it is simple but effective solution against ransomware.

what is your concern about data loss?

The concern is mainly something I might have overlooked. Eg At some time I am going to delete the last backup of an unchanged file. But there will be still the current version in the ‘current’ directory so that is not a problem. Still there might be scenarios I haven’t thought about.

I use this strategy. It is less optimal than backup-first programs like restic, duplicacy, borg, etc. But has some very real advantages including less complex, 1:1 file storage, easy browsing, and (if you don't use encryption), no lock-in.

I think it is, overall, a very good strategy!

However, it is far from perfect. The two biggest issues are (1) you can't recover to a point-in-time and (2) you can't safely prune automatically.

A snapshot-based backup like those of restic (which work on blocks), or rsnapshot or TimeMachine (with work with hardlinks) let you (efficiently) create a full snapshot of the backup as it stands at a point in time. Each snapshot is self-contained in a sense that you can delete one without the others. So you can restore your machine to what it looked like at a specific time. With the rclone strategy, you cannot. You can try to mitigate this by keeping an up-to-date hash of every file at backup time but you still would have to hunt around your backups to restore to that.

Which leads me to issue (2). If you try to prune a backup, you may lose the piece you need at a later time to fully recover a state of the backup. So with snapshot tools, you do a snapshot once an hour and then can prune later without worrying. You may not need every copy of every file you backup but you may want a state of it at a certain time. So to prune, you need to look at whats in the backup directory and see if you can live without it ever again.

Those are major drawbacks in my opinion but they are not so major that this is a bad strategy. I use it because I know and trust rclone. It is (super) actively developed and supported. And I know many of its idiosyncrasies. I also think it is much (!!!) less complex than the other options which I would happily take over efficient when it comes to backups.

There are other less-major drawbacks including but not limited to: no deduplication of small changes, limited file-move tracking (esp. with crypt), manual retention, checksum support requires rehashing everything, not many guardrails, file-name length limits (esp for crypt), etc.

On a security front, it is good but not impervious. I think it is fine for any automated attack like ransomeware. It would either backup your files or not even sync them.

It is less good for a non-automated malicious actor. Someone who gains access to your system and wants to wreak havoc will be able to access your configs (see later note) and purge it all. You can mitigate this in a few ways. One is to encrypt your config but then you need to manually backup and/or figure out a safe way to store the password in memory. Another thing you can do for certain providers is to use an API token that doesn't allow hard deletes. The problem with that is that the --backup-dir process will move (usually a copy + delete) but if you don't allow deletes, then you have double. You may be able to mitigate that with lifecycle rules too (e.g. keep all files for 10 days). That is provider dependent.

Personally, I am much less worried about a coordinated, manual attack on my backups. I am mostly worried about user-error (i.e. ME). If I delete something by accident or even nuke the system (of course, that didn't stop me from accidentally nuking the backups once...). Ransomware is a secondary concern as this is what I use on my server and not my personal computer.

I hope this helps. Good luck!

those are good points, i agree with all of that.
i would not rely on rclone as the only backup of important files.

in my case, i always have two backups per server.

  • a bare metal recovery image of the entire server, full and block based incremental snapshots, using veeam, copied to a local backup server; so i can revert the entire server, disk partition(s) or individual file(s) to a point in time. and then rclone those veeam files to a cloud provider.

  • a file based backup for data files, using rclone sync --backup-dir direct a crypted remote using a different cloud provider.