Rsync --link-dest how-to

Hello!

Many many thanks for the great tool rclone, I have spent a lot of time the past couple of months exploring cloud storage, and I learned a bunch of fun stuff. But you can’t always play and explore, you also have to work. So I am setting up a backup system for all my clients (I only have 4 or 5 that could use this).

In the past I have always worked with rsync, and lately discovered the --link-dest option which is really cool allowing me to offer to the customers versions of all their files for the last 30 days at 06:00 09:00 12:00 and 17:00 each day and to take up only about 300% (instead of 12000% with an equivalent basic standard copy operation) of the original data set.

with rsync my command is: (don’t know how-to quote code)

rsync -av --delete --exclude /backup --exclude /media --exclude .bash_history --exclude lost+found --exclude .cache --link-dest=$LINK_DEST /srv/ $DEST > $FILE_LIST

DEST and LINK_DEST are defined as follows, I think LINK_DEST is basically the most recent backup

DEST=$BASE/srv-$(date +%d-%H)

LINK_DEST=$BASE/$(ls -1 --sort=time $BASE/*/timestamp.txt | head -1 | grep -Eo ‘srv-[0-9]{2}-[0-9]{2}’)

I make sure that $DEST is empty before the rsync command.

This is great cause it’s all local on the samba server, but I want to implement something in the cloud that would allow a dozen copies of the data in the following way:

day, day-1, week, week-1, month, month-1, Jan, Apr, Jul, Dec, year, year-1

So this way the customer could go back quite a bit in time. I’m pretty cost effective and love grinding things, so I found ovh.com to offer some cold storage at 0.0034 CAD/GB/month to be pretty much the cheapest “reliable” (read should not shut down in the next 5 years) one. So I implemented everything with rclone and works fine, the only problem is that im using up 12x150G’s when if I could use rsync and --link-dest this would take up about 1.2-2.5x150G’s. At least I’m not paying for the traffic 12 times cause rclone does “server side copy” when I do the rotations, so that’s great, but I was wondering if I could grind my rclone command to work a bit like the rsync and use up much less space in the cloud? For the moment I’m using a very simple “rclone sync /srv pca-enc:[customerId]/day/srv” and a rotation script that does “rclone sync pca:[customerId]/day pca:[customerId]/[slotName]/srv” when required according to each day, week, month, …

Hope my question is not too long! Good day to all, thanks, Louis

As far as I’m aware, rclone does not support anything like the --link-dest option of rsync. And rsync can only do this, I believe, on file systems hat support the creation of hard links.

Most remote or ‘cloud’ systems don’t expose low level things like this, so it might not make sense for rclone to implement it, just for systems that do expose it. For system that do expose it, you have rsync!

So use rclone for cloud -> local, then rsync --link-dest that local repo, or a copy of it to a --link-dest hierarchy.

I think you might be able to use --link-dest with rsync.net, and perhaps some other services (elastichosts perhaps) and virtual servers, but for the most part, I guess you can only use --link-dest on your local side.

It might be an idea to look at ZFS and btrfs for other snapshot options.

===Rich

Many thanks Rich, I see what you mean, this feature is more a file system design thing and it would be hard to support on a remote side that does not have the same file system. Thanks also for pointing out the remotes where it could maybe work using rsync (instead of rclone), I will have a look into that when I have a minute. All the best, Louis

rclone can’t do that directly, however have you spotted the –backup-dir flag - that allows you to put any files which changed or were deleted into a backup directory.

With a bit of scripting you can keep dated backups quite easily. These are only “deltas” unlike the linked approach with rsync, but all the data is there for going back in time and reasonably easy to access.

Yes, using something like --backup-dir remote:/somewhere/`date “+%Y-%m-%d-%H-%M-%S”` works miracles. At each run all changed/removed files land in a separated folder for the specific rclone run of that time. It is ALMOST as good as an incremental forever backup; the only thing missing is which (new) files were transferred to destination. At each run nothing is really lost (even if some malware overwrites the originals, even if the filesystem malfunctions and you sync from an almost empty folder nuking most of your destination) but still you can’t easily revert to precisely the state from any given run.

Maybe we should write a wrapper that keeps track of all new files that were transferred at each run, this will allow to revert all changes, just remove the new files and move back everything from the corresponding backup-dir and that’s it.

Ok, many thanks for bringing up this option. Just to make sure my logic make’s sense, assuming src is my production directoy on my server, I would use sync src rem-enc:day/src --backup-dir rem-enc:changes/$(date +%Y-%m-%d)/src each day. Do you think that the logic to bring back the contents to the same thing it was right after the backup on a given date could be something like this?

target_date=2018-10-28
first_backup_date=2018-03-17
rclone purge rem-enc:rebuild_bucket
for work_date in [all calendar dates from $first_backup_date to $target_date]
  rclone copy rem-enc:changes/$work_date/src rem-enc:rebuild_backup/src

Just want to make sure I understand how it works, also please note that rem-enc: is an encrypted remote, I hope that’s not a problem. I think we could end up with all the deleted files in extra, but that should not be too much of an issue. Thanks, good day, Louis

edit: (addition) hum - looking at this, I’m afraid that “changes” could endup doubley encrypted (or is my imagination looking too far)? Basically the remote rem-enc: points to the remote rem: Do you think rclone would tolerate the command rclone sync src rem-enc:day/src --backup-dir rem:changes/$(date +%Y-%m-%d)/src instead of the one used?

Yes, that’s the mechanism (that’s not to say I know that precise code will work as it is, I don’t even know what language it is, powershell?).

It works as in the first (not last) paragraph, you use for both destination and backup-dir the encrypted remote. It will just move the (deleted/changed) files on the encrypted remote from one folder to another (well… the move for some remotes like (I think?) B2 is actually not directly possible so it’s probably something like download/upload for each file instead of a remote move).

Other than that I’m using the “description” of the backup before the timestamp, that is changes/src/$(date +%Y-%m-%d)/ for example. Assuming you would also have changes/pics/$(date +%Y-%m-%d)/ for example. Is easy to organize all backups that belong to the same batch in their folders.

The remote used as the destination and the one used in --backup-dir need to be the same. The files won’t get double encrypted - the encrypted files will be moved as-is into the backup-dir when required.