Scripted --backup-dir pruning

kai · January 14, 2024, 8:39pm

What is the problem you are having with rclone?

Automagically removing old reverse incremental backups. Using the old --backup-dir flag with dated backup folders, so everytime the backup is run, a new reverse increment is created.

Looking for some battle tested code (pref bash code?) that prunes the backup to say 1 copy max per day, 1 per week, 1 per month and 1 per year. i.e. log span over 1 year.

I'm not looking to be able to do restore state at a point in time precisely, just want to be able to pull old versions of individual files out over a decent span of time if necessary.

Barring that, maybe some hints/constructive challenges with pseudo code? I'm currently thinking of:

Set up cron to trigger recurring daily, weekly, monthly and yearly
Use initial call Tref = 4 AM in the morning or sometime where load is non-intrusive
cron calls the backup script with an arg flag like -D, or -DW, or -DWM (and I supposed -DM and -WM are also possible variants)
Allow for some smarts in cron and/or code to deal with over lapping calls using lock or something.
e.g. -D and -DW would "always" overlap once every 7 days. -D should be inhibited.
each time the backup script is called it does its rclone --backup-dir thing. Each backup folder is marked with flag (D, DW, DWM, etc as per arg) to indicate the applicable pool.
Depending on arg flags, delete rclone older entries on a per pool basis:
* arg contains D flag: its been 1 day since last D call. Delete all folders containing D mark except for the most recent X copies.
* arg contains W flag. its been 1 week since last W call. Delete all folders containing W mark except for the most recent Y copies.
etc. X and Y values are pre-set inside the backup script.

The trick here is dealing with the overlapping calls and obv only deleting the right files because its irreversible. Hence why looking for battle tested code.

Suppose I could test with cron running on fake accelerated frequencies and dummy calls.... but i really would rather start from something (or use something already done) rather than just my pseudo code.

Note: this is very much like timeshift's retain scheme.

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.1
- os/version: linuxmint 21.2 (64 bit)
- os/kernel: 5.15.0-91-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

most likely b2

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

n/a

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

n/a

A log from the command that you were trying to run with the `-vv` flag

n/a

kai · January 14, 2024, 9:04pm

Maybe flag checks for deletion must be exclusive and X-1 and Y-1 used instead.

Monday 0000 -> initial DW copy

Tuesday 0000 -> intial D-only copy. Can't delete Monday because it was a W flag.

Wednesday 0000 -> second D-only copy. Don't delete Tuesday's copy. If something happens at Wednesday 0100, we would only be able to go back 1 hour.

Thursday 0000 -> Now we can delete Tuesday copy. Something happens at Thursday 0100, we still have the run from Wednesday 0000.
...

Next Monday 0000 -> "second" DW copy. Same logic as above, keep prev Monday DW copy.

....

Fortnight from Tref -> delete initial DW copy. Something happens just after this point in time, prev Monday's DW copy is still available.

Increases required storage but keeping only 1 copy in each pool does seem to be on the not so helpful side of things, especially for the weekly+ pools.

kapitainsky · January 14, 2024, 9:04pm

My advice here is not to use rclone. Use right tool for the job.

rclone is fantastic tool but definitely not a fully fledged backup program. Pretty much everything you are asking for is supported by well established and battle hardened programs like restic, kopia or duplicacy and many others. Some of them even use rclone as part of their solutions - as a tool to exchange data with cloud storage.

kai · January 14, 2024, 9:37pm

That's a good point re: complexity. I think I'm right on the edge between a sync and a backup approach, and my preference is to go with the lighter version of the two. And I kinda have the base script set up already.

I'm just going to cron at a fixed weekly frequency (this bit I know works) and have the script delete folders older than X weeks. Files are spread out uniformly rather than log-wise so higher storage reqs for given span; on the other hand much simpler too.

system · January 17, 2024, 9:38pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.