Rundown of my backup setup, and a suggestion

Eric_Monson · February 7, 2022, 8:18am

This posts exists to outline my current backup situation in case anyone wants to adapt it to their own needs, and it highlights the rclone features that made such a solution possible.

So, I like zpaq (Well, zpaqfranz now) for my backups. It serves my purposes very well, and I trust it quite a bit (especially since I generate .par2 files that go along with each archive version. I use rclone to upload the files, but it can't resume uploads, and the first archive version was over 200Gb. Given my Internet, that's about a four day upload, and it kept failing for various reasons. This is the solution I came up with:

Inside my local backup directory, I have two subdirectories named "e" and "m". I use the "chunker" backend layered on top of the "crypt" backend layered on top of the "local" backend pointed at the "e" directory, which will end up containing the (encrypted) chunks. (I also have the chunker set to sha1all for paranoia's sake). I absolutely love how I can create a stack that does exactly what I need for a particular situation like this.

The backup script uses "rclone mount" with vfs-cache-mode set to "writes" to mount that chunker onto the directory "m". Two subdirectories have been created inside: "d" and "i".

zpaq is then run with an external index in "m/i" and the actual zpaq archive in "m/d" using a wildcard in the name so that it generates a new file for each version rather than altering any of the existing archives. These separate directories are useful for exploiting an rclone feature later.

After this, the script removes all par2 files from "m/i" (because they are invalid for the updated index) before walking through "m" generating par2 files for any zpaq file that does not already have them.

The script then calls for a sync and waits 60 seconds to let the backends finish before unmounting "m". (FEATURE REQUEST: It would be better here if I could just pass rclone an argument that tells rclone mount to finish all operations waiting in the cache before actually unmounting.)

At this point, I have my backup files chunked into reasonable sizes and encrypted in "e". The script then uses rclone copy to put the zpaq archives (formerly "m/d/") into (a similarly encrypted location in) the remote storage before using rclone sync to copy the index (formerly "m/i/") into the remote storage.

The cool thing about the two-stage upload is that I can delete some or all of the local data files anytime I start running low on space. So long as the detached index is there, I can still add new versions whenever I want and continue to benefit from the global deduplication. Since the new versions upload through copy, it won't propagate the deletion, so I keep all the data remotely. Since the index uploads through sync, it will replace old versions of the index and get rid of the now-invalid par2 files for the old version.

The ideal addition would be using a storage system where you can make files temporarily read-only when they are uploaded, so you could block an active attacker from deleting your backups, but this is adequate for my purposes, and I wouldn't be too surprised if it was enough for many others, as well.

My biggest concern, of course, is 60 seconds not being long enough for the backends to finish. In my testing, it has been more than enough every time, but the machine my (Rockstor) NAS runs on isn't exactly old, so I can't say that applies to everyone.

Admittedly, this is quite a bit of indirection just to get partial uploads working (The local chunker would not be needed if partial uploads were resumed, but even using the chunker on the remote backend does not make partial uploads work. It will simply try uploading the file again from the first part...), but I'm honestly pretty impressed with how well it works as a workaround. I get a local copy I can trim as needed and a stable remote version that can upload even when Internet (or power, unfortunately) is unreliable.

This is almost certainly not optimal, and I would welcome constructive criticism, but I searched for a good while on this forum for ways to resume the upload of a very large file, and I never found it. When I found a workaround that consistently functioned (testing backups is important), I figured I should share.

ncw · February 7, 2022, 10:57am

Thanks for the description

I think the par2 format is very clever. We've thought about an erasure encoded backend for rclone in the past.

What cloud storage are you using?

Eric_Monson · February 7, 2022, 12:04pm

A backend with erasure encoding would be an amazing thing. I would have to reupload everything, but it would be worth it to further simplify the script and remove an external dependency.

I'm a bit paranoid, I suppose, so I upload to two cloud providers and keep one copy on my local NAS. The cloud providers are b2 and pCloud right now. I've been quite impressed with both for the price.

The best part is that they were the original providers for this particular scheme, but they weren't my first providers. It still amazes me that a few simple changes to my rclone config were all it took to swap them out with a frankly bewildering array of alternatives. Why most of the people I know stay stuck on programs tied to single providers, I'll never know. pCloud, for instance, even charges extra for E2E encryption in their native app. In a world where even my phone can run rclone with the crypt backend, I just don't get paying extra for that.

Also, is there an option I missed to tell rclone to make the unmount wait until the backend is in sync? I was quite surprised it didn't just work that way until I realized that it's almost never going to be a local mount and making users wait for hours to unmount a filesystem would be a pretty brutal gotcha of its own... Honestly, I was surprised I had to enable the VFS cache at all for writes to work correctly on a local mount, but that's really the same situation: Optimization for such an unusual use case amounts to little more than bloat.

ncw · February 8, 2022, 10:32am

I agree. Just need some calm time to think about it properly!

In the mean time you could use the tardigrade/Storj backend which has erasure coding built into the protocol... The price per GB is good too.

It would be nice to be able to do that.

I guess there could be an rclone option --vfs-sync-on-close let's say which means make sure all files are uploaded before unmounting. I'm not sure how well that would work though in practice as things get upset (the macOS kernel in particular) if you take too long doing FUSE operations. Then again, we could shut down the mount and carry on uploading stuff in the background. So maybe that is two options --vfs-sync-on-close hard or --vfs-sync-on-close background, maybe...

The other way would be to make an rclone rc command which showed the number of pending uploads. In fact we added one recently vfs/stats in the latest beta

You could use this to query "uploadsInProgress" and "uploadsQueued" and only unmount when both are 0.

Wibi · February 9, 2022, 5:09pm

Thanks for sharing your setup!
I'm also trying to figure something out and I'm not sure about some of the things you mentioned.

I generate .par2 files that go along with each archive version.

Are these files a "paranoid" type protection or do they have a more reasonable purpose in your scenario?

Since the new versions upload through copy, it won't propagate the deletion, so I keep all the data remotely.

How would you go about deleting files if your remote got too big?
Would it be possible to see what files you have deleted locally that only exist in the remote?

zpaq is then run [...] using a wildcard in the name so that it generates a new file for each version rather than altering any of the existing archives.

From what I've read, does that mean you cannot use the -until command to download from a specific version?
So if you deleted a file locally, could you access that file remotely after having added new versions?

Hope you can help me out a bit.

PS: Would you mind sharing the scripts you use for this?

Eric_Monson · February 10, 2022, 7:18am

They're part of my paranoia, yes, but I think it's a part which people would do well to use in this circumstance. zpaq is big on deduplication. No matter how many times a unique chunk of data shows up in one or more files across one or more of your backup generations, it is stored only once. This is great, since it greatly reduces the storage space you must use. It's also terrible, though, since minor corruption on disk can blow away the only version of that data no matter how many times your backup has contained it.

par2 files use relatively little space to add some of that redundancy back in a uniform way.

I'll use an example to illustrate what I mean by "little space". The first archive in my new archive set is 182 GiB. The vast majority of that data will be shared with all future archives, so that data will remain important for a long time. Even a single bit flipping will cause some damage. If I'm unlucky, it could be damage to something important. I generated my par2 files with the default settings since that seemed reasonable enough, so the ~7 GiB in par2 files along it can restore the file to pristine state with up to 5% of it changed. That's far more corruption than is likely in any foreseeable circumstance, so bit rot should not be too dangerous. Of course, the par2 files themselves could be subject to bitrot. This particular par2 archive is stored as eight separate files, and those files are broken into a total of 100 data blocks. Corruption of those blocks reduces the amount of corruption in the original archive that can be fixed, but there could be damage to quite a few of them before there's too much to fear. In other words, you use a bit of extra storage to greatly improve the resilience of your archives.

For files that can withstand a bit of corruption without you caring (I once had a bad disk that chewed up a pile of video files pretty badly, for instance. The diff against the original was pretty large in binary terms, but they still mostly played fine with very little noticeably wrong.), there's no need. If every bit could be critical, though, I always recommend using par2 for the extra assurance. I can't even tell you how many times my old habit of padding CD-Rs and DVD-Rs with par2 files (You can specify the size of the resulting file instead of specifying the percentage, so it's easy to take all the otherwise-wasted space at the end of a disk and turn it into protection.) has saved data I cared about.

If you're using an archive format like the one FreeARC uses, which has recovery builtin, that would be another reason for it not to matter. For an aggressively deduplicating archive format, though, it's a very good idea.

In my case, too big isn't the only consideration. The archive for my server is also kept in zpaq format, and it hasn't grown as much as I expected it to. I'm going to have to address it soon, though, because it's about to reach generation 1,300, and the time it takes to open a zpaq archive (even just to list files) is partly proportional to the number of generations. My original plan was to occasionally blow away the newer generations (I'd have to regenerate the index file, but erasing newer data is as simple as removing the newer files). I may still do this, but I've also considered writing a small script that keeps some of the data in a new archive sp I can move the new archive aside and start building a new zpaq archive from generation one again. The idea is simple enough: Extract a certain generation, add it to the new archive, repeat with different generation.

I figure the data closest to the end is the most likely to be something I'll need to come back to (Since files largely exist to be used, the odds that I have used it in the meantime increase with time. Files that were missing or damaged when I needed them have already been restored and should also be in more recent generations. For this reason, the files that only exist in newer generations are more likely to have gone missing without my having occasion to notice yet), so I'm planning to restore just the first generation, then move forward half the remaining generations and restore that, then move forward half the generations that are still missing, and so on until I have added the most recent data as the final generation.

This way, I generate something like the results of the Tower of Hanoi backup strategy, which I've always found to be quite useful in the past. I haven't written the script yet (in part because I'm wondering if I can find some fun way to use -repack to add a generation to an existing archive, but I've been a bit busy and haven't tried), but it shouldn't be terribly hard. The reduction in space usage it gives me will only be proportional to the ephemeral data on the generations I'm dropping, but the number of generations will grow only logarithmically against the current number of generations, which will be great for restoring quick access to the data that's still in there.

So long as I use the -index option, I can keep a local index of the contents of the archive regardless of whether I have local access to the actual contents. In addition to keeping content hashes so it can recognize and duplicate blocks it has seen before in the data files that are not physically present, it can list all their contents. If you're referring to getting a list of files to download to restore a particular generation of a particular file, I don't think zpaq has any tool for identifying that. On the other hand, rclone mount will allow it to download the necessary data while not touching the files it doesn't need, which basically gets the job done for you.

Yes, -until won't work. What you have to do is move the files after the generation you're interested in restoring out of the way. On a remote backend without move support, that would be very painful. Locally, it's pretty trivial. If you need to restore a particular generation from the remote copy, it just means you only need to download up to that file (and not use the index, of course, since that would be for the current state). Honestly, zpaqfranz (the only actively developed fork I know of, since the original author retired) has added some quality-of-life improvements. I wouldn't be too surprised if -until was extended to support multipart archives in that, though I haven't checked.

My free time varies widely, and my expertise, to the extent it can be called that, is often specific to what I've needed to solve particular problems. To the extent that these limitations are not a problem, I'm glad to help if you have further questions.

... I can post it, but I would like to stress that it is still in the "backup manually and watch for problems" stage right now. The current script is a proof of concept that is working well enough that I consider the concept proven and wanted to share it as my small way of giving back to a project that has saved my bacon on several occasions.

Given the information @ncw provided in his latest response, it seems I'll be moving to the beta and adding a loop with some rc calls in the near future. The idea that the chunker will someday fail to finish before I unmount and begin copying is definitely what has me most concerned. Once that's been tested, I can get around to a proper rewrite of this kludge.

One last time: DO NOT take these files to be an immediately useful solution. They are proof-of-concept with little to no error handling and no ability to backup different sets based on passed arguments (In fact, they accept no arguments at all.).

back.txt (940 Bytes)
rclone.conf.txt (483 Bytes)

EDIT:
Fixed a broken quote tag and added responses to parts I missed in the original.

Eric_Monson · February 10, 2022, 8:19am

I knew Storj existed, but I never really looked into it. Looks like that was a significant mistake. Thanks for the heads up.

--vfs-sync-on-close is exactly what I was looking for, but now I realize how foolish I was. Waiting on the umount could be a nuisance for the whole system.

The background option, though... I never even considered that approach, and I have no idea why. The tradeoff seems ideal: The mount stops cluttering the list immediately, doesn't risk any other annoyances in the system, and is still extremely simple to use in scripts: Just wait on the job or PID of the rclone instance that handled the mount. A one-liner that doesn't need a loop or anything.

It's beautiful.

What an astonishing coincidence. I'll certainly take "works now, but requires some extra steps", and this looks like it's perfectly usable. Time to move to the beta, I suppose. Thank you for this excellent news. I was going to look into the rc options, but I was mostly hoping there was some kind of vfs-sync command that would wait until it was done. Querying the progress isn't something that had occurred to me yet. Of course, if it's only in the beta, I wouldn't have found it in my copy, so the time would have been wasted in the most ironic of ways.

Again, thanks a bunch!

Wibi · February 10, 2022, 1:21pm

Thank you for the detailed reply!

By "necessary data", do you mean a specific file or only parts of a specific file?

So if I have a 50 GiB zpaq version, e.g. the initial version...
When mounted via rclone, is it supposed to download the whole file or only the necessary parts?
In my test with a smaller archive, zpaq downloaded the whole file before listing the content or extracting a file.

If this is the way it works, I guess it would be easier to have a local zpaq backup that I can access quickly and that would be synced to a remote.

asdffdsa · February 10, 2022, 1:40pm

can you post the rclone mount command?

that is what i do, keep a local copies of backups and rclone copy --immutable to cloud.

Wibi · February 10, 2022, 2:00pm

I use the "Rclone Browser" software and I have set the mount options to:
--vfs-cache-mode writes --log-file=rclone-mount-log.txt --log-level INFO

I would like that but I would also have to buy a lot more storage...
I'm still waiting for the DNA storage revolution

asdffdsa · February 10, 2022, 2:04pm

for 7zip, from rclone mount, can open without having to download entire file.
never used zpaq and .par2 files.

Eric_Monson · February 11, 2022, 12:21am

It should only download the required parts in chunks according to VFS Chunked Reading. For the purpose of experimentation, I pulled a single file from a rather large archive and the amount it downloaded seems quite reasonable given that it will certainly need to obtain the entire index (stored at the ends of files as usual) before grabbing the actual data. (For reference, a detached index can be used for querying or for adding new generations, but it cannot be used during extraction.)

enas:/mnt2/zpaq # time zpaqfranz x m/d/enas\* -only /mnt2/users/eriix/zpaqfranz.exe -to o -space
zpaqfranz v54.11-experimental archiver,  compiled Jan 25 2022
franz:Do not check free space/writeability
m/d/enas*.zpaq:
7 versions, 164.549 files, 3.086.995 fragments, 198.775.210.520 bytes (185.12 GB)
Long filenames (>255)       134
Extracting 1.844.736 bytes (1.76 MB) in 1 files -threads 8

79.314 seconds (000:01:19)  (all OK)

real    1m19.317s
user    0m37.317s
sys     0m0.213s

This was run with a clean cache directory (c) with --vfs-cache-mode full on a filesystem that supports sparse files so I could see at a glance how much it needed.

131M    c
1.8M    o

o contains the output file, while c is the cache directory.

Inside the cache directory, we can see exactly which files were downloaded and how much of them:

121M    enas1.zpaq
4.0K    enas2.zpaq
4.1M    enas3.zpaq
2.1M    enas4.zpaq
4.0K    enas5.zpaq
3.0M    enas6.zpaq
1.3M    enas7.zpaq

Full file sizes for reference:

182G    m/d/enas1.zpaq
1.0K    m/d/enas2.zpaq
11M     m/d/enas3.zpaq
20M     m/d/enas4.zpaq
512     m/d/enas5.zpaq
3.2G    m/d/enas6.zpaq
259M    m/d/enas7.zpaq

In light of that, I can only say it seems to work fine for me.

You said you were using a smaller archive for your test. How small was the archive? Larger than the 128M chunk size for reading, right? How did you determine how much was downloaded? I'm pretty curious as to why we seem to have different results.

Edit: Minor grammatical fixups.

asdffdsa · February 11, 2022, 12:30am

fwiw,
you might know this but the OP might not.

chunked reading works without any --vfs-cache-mode, in effect --vfs-cache-mode=off

Eric_Monson · February 11, 2022, 12:46am

Thanks for the heads up. I was using it in this case so I could examine what was actually downloaded, but it's good to get that confirmation.

asdffdsa · February 11, 2022, 12:50am

yeah,
some rcloners, including myself, who use a rclone mount to stream from, do not --vfs-cache-mode
and in some cases, like rclone mount on a raspberry pizero, sdcard is an issue.

Eric_Monson · February 11, 2022, 7:13am

Just implemented this in my code (In the laziest, most pathetic way: I check if "$(rclone rc vfs/stats|grep uploads|grep -v 0)" is empty. Doomed to fail if any new "uploads" stats show up, but good enough for a test.), and it seems the sleep 60 I was using to give the chunker time to finish on the local mount could have been replaced with a sleep 2 pretty easily .

With this in place, I have no further concerns about the applicability of this approach, so I'm going to go ahead and make a more permanent backup script next time I can get to it. I just wanted to say "thank you" again for pointing out that there is a simple way to monitor the process. Works wonderfully.

On interesting bit I ran into is that I had to drop --daemon. If you pass both --rc and --daemon, the mount never actually happens. It just launches the remote control and sits there until you close it. So long as you don't use --daemon, it works great. If this is intended behavior, it might be worth mentioning in the docs for daemon mode.

ncw · February 11, 2022, 2:49pm

Looks good to me

I think there is a bug report for that already... See this issue for the master list of all the things that could be fixed with --daemon!

github.com/rclone/rclone

mount: improve implementation of --daemon mode

opened 12:01PM - 05 Oct 21 UTC

ivandeex

VFS / mount thinking OS: Unix terminal refactoring

### References This ticket: - supersedes (?) #2618 - draws ideas from https…://github.com/rclone/rclone/issues/2968#issuecomment-716286668 - and discussion at https://github.com/rclone/rclone/issues/2968#issuecomment-716679470 till https://github.com/rclone/rclone/issues/2968#issuecomment-728309447 - is related to #4764 Also related to a number of [console password input](https://github.com/rclone/rclone/labels/terminal) tickets - see `Password prompts` in a comment below. ### What problem are you are trying to solve? Looking at the rclone initialization code from the --daemon point of view I can see stuff that should run in the child only and not in parent: - [rc-server](https://github.com/rclone/rclone/blob/v1.52.3/cmd/cmd.go#L379) - if enabled, it will bind to port in parent and cause "cannot bind" in daemonized child - [NewFsDir](https://github.com/rclone/rclone/blob/v1.52.3/cmd/mountlib/mount.go#L305) - it chains to backend-specific, unconstrained, generally fork-unsafe [Fs.NewFs](https://github.com/rclone/rclone/blob/v1.52.3/fs/cache/cache.go#L90) This calls for refactoring, esp. [initConfig](https://github.com/rclone/rclone/blob/v1.52.3/cmd/cmd.go#L362) can be split into a function common to all commands to be used in `cobra.OnInitialize` and into another function to contain child-only steps. Most commands would inherit it from `rootCmd.PersistentPreRun`. Daemonizing commands would disable it by `PersistentPreRun: func(){return nil}` and then call from child... just an idea ### How do you think rclone should be changed to solve that? We might refactor backgrounding to have rughly 3 stages : - _parent pre-init_ will run a hand-picked subset of fork-safe operations from current `cmd.initConfig`, before callsing`os.StartProcess` - _child setup_ will take place in the child performing other unsafe operations from `initConfig` e.g. start rc-server, calls `fs.NewFs`, set up _vfs_, start fuse mount and return its status to parent. This stage closely mimics current non-daemonizing flow. At this stage child has fully working stdin/stdout/stderr (either parent redirects child streams through iself or perhaps the child just sits attached to the parent's terminal session) and can log debug/error messages to stderr or request user input from stdin. Parent just pipes child's streams and waits for status or timeout. - _child detach_ - parent receives daemonization/mount status from child (or detects timeout), logs it and exits, while child: - finishes sending status to parent as a specially formatted line on stdout/stderr or via dedicated _pipe_, - replaces stdin/stdout/stderr by _/dev/null_, - stops output from stats and pacers, - closes extra files and pipes, - calls _setsid()_, - probably does _setpgrp_ (detaches from process group) to ensure it will be later reaped by _pid 1_ when it exits or aborts, - reopens log file descriptor and syslog connection if needed (because if any file descriptors remain from parent, the child later ends up as a zombie), - probably detaches itself from original _cgroup_ (if it helps with systemd), - proceeds with business as usual! As a consequence, we have to: - triage `initConfig` contents (it's rather small) between stages 1 and 2; - think of switching child logger between file and redirected stdout(err); - rethink config key; - deactivate systemd log format in child; - parent should issue warnings in advance if child's log file or stdin helper isn't configured. - refactor all ReadLine, ReadPassword, fmt.Scanf and alikes in the core including _fs/config_ and in backends (where they usually read _2fa_ secrets) into a common wrapper that can: - read terminal when available (and optionally hide user input if it's secret), or - read stdin (prompt on stdout) piped from parent at stage 2, or - use universally configured helper executables(files) when detached, or - log errors when not configured; Another idea is to keep the `StartDaemon` where it was already butuse a sort of sync mechanism between the parent process and the child process (pipe?). ### How to use GitHub * Please use the 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to show that you are affected by the same issue. * Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue. * Subscribe to receive notifications on status change and new comments.

It is actually quite difficult to daemonize Go programs and there are loads of corner cases. My preference is to use something like systemd instead.

Wibi · February 12, 2022, 11:25am

Thank you, I will try and test again with another remote. Storj seems to work fine.

It must have been due to the default "128M chunk size". The speeds on the remote I was testing (a free jottacloud) were not good enough to try with a bigger file.

I'm using a VPN and tried with different IPs but don't have this problem on any other sites. I can normally upload with my max speed of 1 Mbps. It first seemed to work fine but very inconsistent and it's not getting better.

Output of my last upload test:

2022/02/12 10:34:18 INFO  :
Transferred:  2.439 MiB / 9.018 MiB, 27%, 4.558 KiB/s, ETA 24m37s
Transferred:  0 / 1, 0%
Elapsed time: 10m37.6s
Transferring:
* test.zpaq:  1% /6.673Mi, 4.885Ki/s, 22m58s

2022/02/12 10:34:19 ERROR : test.zpaq: Failed to copy: Post "https://053-up-e.jotta.cloud/files/v1/upload/xxx": write tcp xxx.xxx.xxx.xxx:xxx->xxx.xxx.xxx.xxx:xxx: wsasend: An existing connection was forcibly closed by the remote host.
2022/02/12 10:34:19 ERROR : Attempt 3/3 failed with 1 errors and: Post "https://053-up-e.jotta.cloud/files/v1/upload/xxx": write tcp xxx.xxx.xxx.xxx:xxx->xxx.xxx.xxx.xxx:xxx: wsasend: An existing connection was forcibly closed by the remote host.
2022/02/12 10:34:19 INFO  :
Transferred:   	    2.439 MiB / 2.439 MiB, 100%, 4.558 KiB/s, ETA 0s
Errors:                 1 (retrying may help)
Elapsed time:     10m38.3s

2022/02/12 10:34:19 Failed to copy: Post "https://053-up-e.jotta.cloud/files/v1/upload/xxx": write tcp xxx.xxx.xxx.xxx:xxx->xxx.xxx.xxx.xxx:xxx: wsasend: An existing connection was forcibly closed by the remote host.

Eric_Monson:

we can see exactly which files were downloaded and how much of them:
121M    enas1.zpaq
4.0K    enas2.zpaq
4.1M    enas3.zpaq
2.1M    enas4.zpaq
4.0K    enas5.zpaq
3.0M    enas6.zpaq
1.3M    enas7.zpaq

I see, this looks pretty good.
So your download speed was roughly 1.6 Mbps? Is this your ISP speed or is it because of other limitations?

Eric_Monson · February 13, 2022, 10:55am

My house was built in the 70s and both the cable and DSL providers refuse to replace the lines that run through the neighborhood, so my Internet has some serious limitations.

One upside, at least: I can easily use -m4 in zpaq without worrying too much about slowing down a remote restore, and keeping the file size low in such a way saves me both money (on the storage payments) and time (since upload and download are both sluggish). -m5 is even slower than my connection, but faster CPU may make that an option eventually.

system · April 14, 2022, 10:55am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.