Rclone as destination for Borgbackup

Hello @dr.mcgillicuddy,

Please be aware that I'm not an expert on borg, I'm still literally waiting for my first backup to finish.

With that caveat, I will try to answer yours questions the best I can:

I'm not sure what you mean by "deduping the repo" -- if you mean the borg repo, it's already deduped 'before' rclone sees it.

You mean, a micro instance running borg in server mode, and writing to storage attached to instance, and then using rclone to copy it over to the cloud? That would certainly work. You can even run rclone in a while loop so as you don't need to wait for borg to finish to start copying, as I did above.

Be sure you have enough RAM for borg to operate on your dataset; so far for me borg is using a lot less than I've seen with restic, but nevertheless memory is the main resource used for deduping, and it grows linearly with the number of files and their size. More info here: https://borgbackup.readthedocs.io/en/1.1.10/internals/data-structures.html#indexes-caches-memory-usage

I'm not familiar with Arq, but regarding Borg please note that it only does the first two right now ("local backup" and "NAS backup", and the latter presuming you NAS is either running borg in server mode and being accessed via SSH from the client, or exporting a network -- NFS, SMB, etc -- share to the client, where borg would handle it just like local storage). To do "cloud storage" with borg right now, you'd either have to try and run it writing to an rclone mount mountpoint (with the caveat @ncw noted above), or set up a VM instance (like the micro instance you mentioned above) to run borg in server mode with the "cloud storage" either attached to it (currently not possible with Google Drive storage AFAIK, so you'd have to use Google Cloud storage which is expensive in the long run), or writing to attached instance storage and then copying from there to some "cloud" like Google Drive using rclone separately. '

(and kudos for a fantastic utility @durval).

Thanks, but please note that I didn't write borg nor rclone nor any other "utilities" mentioned here :wink: If you mean rclone, then the guy to thank for is @ncw.

As @ncw mentioned near the top of this thread, people trying to do that are having trouble, and he was so far unable to diagnose, much less fix it. But if you can afford the time for the testing/debugging, please set it up and then contact @ncw with the issues so he can ask you for more details/tests in order to eventually fix this.

Cheers,
-- Durval.

Thanks, but please note that I didn't write borg nor rclone nor any other "utilities" mentioned here :wink: If you mean rclone, then the guy to thank for is @ncw.

Apologies, I was under the impression you were involved in borg development. But obviously, many many thanks to @ncw for his amazing work, and involvement with the community.

run borg in server mode with the "cloud storage" either attached to it (currently not possible with Google Drive storage AFAIK, so you'd have to use Google Cloud storage which is expensive in the long run), or writing to attached instance storage and then copying from there to some "cloud" like Google Drive using rclone separately. '

My intention was to do this with an rclone mount, but should that prove impossible, I would use a locally mounted storage for the initial repo, move that repo to google drive, and attempt to merely run the incremental backups against rclone mount.

As @ncw mentioned near the top of this thread, people trying to do that are having trouble, and he was so far unable to diagnose, much less fix it.

That is not my understanding of his post. My understanding of his post was just that there was some issue using borg with rclone mount; it seems that even if the original poster correctly identified a reproduce-able issue, one solution might just be single threaded operation of both borg and rclone.

My comment, however, was referring to the ability to use the borg mount command to view the contents of the borg repo itself, not to simply add the remote filesystem upon which the repo is stored.

@ncw In the thread that was referenced earlier (https://github.com/rclone/rclone/issues/3641) it appears you were never able to reproduce the original poster's issue?

I haven't reproduced it yet... I ran out of disk space when running the test and got distracted by something else :frowning:

Okay-- here's where I'm at:

$ borg init --encryption=repokey ~/Desktop/borg
nter new passphrase: ***
Enter same passphrase again: ***
Do you want your passphrase to be displayed for verification? [yN]: n

Failed to securely erase old repository config file (hardlinks not supported>). Old repokey data, if any, might persist on physical storage.

Exception ignored in: <bound method Repository.del of <Repository /Users/lebowski/Desktop/borg>>
Traceback (most recent call last):
File "borg/repository.py", line 178, in del
File "borg/repository.py", line 427, in close
File "borg/locking.py", line 383, in release
File "borg/locking.py", line 282, in modify
KeyError: (('MacBook-Pro.local@132155098991334', 94879, 0),)
Local Exception
Traceback (most recent call last):
File "borg/locking.py", line 234, in load
File "json/init.py", line 268, in load
File "json/init.py", line 319, in loads
File "json/decoder.py", line 342, in decode
json.decoder.JSONDecodeError: Extra data: line 1 column 18 (char 17)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "borg/archiver.py", line 4501, in main
File "borg/archiver.py", line 4433, in run
File "borg/archiver.py", line 166, in wrapper
File "borg/repository.py", line 203, in exit
File "borg/repository.py", line 427, in close
File "borg/locking.py", line 384, in release
File "borg/locking.py", line 271, in empty
File "borg/locking.py", line 271, in
File "borg/locking.py", line 267, in get
File "borg/locking.py", line 234, in load
OSError: [Errno 5] Input/output error

Platform: Darwin MacBook-Pro.local 19.0.0 Darwin Kernel Version 19.0.0: Wed Sep 25 20:18:50 PDT 2019; root:xnu-6153.11.26~2/RELEASE_X86_64 x86_64
Borg: 1.1.10 Python: CPython 3.5.7 msgpack: 0.5.6
PID: 94879 CWD: /Users/lebowski
sys.argv: ['borg', 'init', '--encryption=repokey', '/Users/lebowski/Desktop/borg']
SSH_ORIGINAL_COMMAND: None

$ borg create ~/Desktop/borg::User ~/
Killed stale lock MacBook-Pro.local@132155098991334.94879-0.
Failed to create/acquire the lock /Users/lebowski/Desktop/borg/lock.exclusive (timeout).


And this is something I've never seen before:

$ rclone mount borg: ~/Desktop/borg

2019/11/22 21:10:43 ERROR : lock.roster: ReadFileHandle.Flush error: corrupted on transfer: MD5 hash differ "86ac3e3eb9cdb2734fbdc48c3afba048" vs "bef43722f88b89cdf9d2ff19fc375e73"

2019/11/22 21:10:43 ERROR : lock.roster: ReadFileHandle.Release error: corrupted on transfer: MD5 hash differ "86ac3e3eb9cdb2734fbdc48c3afba048" vs "bef43722f88b89cdf9d2ff19fc375e73"


However, when I started over with --vfs-cache-mode writes I got no errors. I only include all of this because of the "corrupted on transfer" errors, the origin of which I do not understand. Note that at no point did rclone throw an error specifying that an action was being requested that required --vfs-cache-mode writes.

Any insight on to what would cause the "corrupted on transfer" errors?

That probably means the file changed while rclone was transferring it.

--vfs-cache-mode writes will keep that file on disk so its value will always be consistent and it will read it back from there too. So if there is a consistency problem with the file changing while it is being uploaded then --vfs-cache-mode writes will fix that.

@ncw So, you would have no concerns about data integrity with --vfs-cache-mode writes in use?

I've now had this running for about a day, have about 360GB uploaded, and the only issue I've seen so far is that rclone's cache size has gotten fairly large:

2019/11/24 04:28:22 INFO : Cleaned the cache: objects 27 (was 27), total size 4.770G (was 4.770G)

My hope is that this is simply because the last few things backed up have been large files (VMs), so rclone has to do some "catching up" with moving files out of cache? Because borg also keeps a local cache, this is actually turning out to be pretty space intensive, at least on first run.

I'm not seeing anything like the issue in #3641

It's pretty reliable I think. I'm not saying it is perfect though :slight_smile:

borg has a check mode doesn't it so I'd run that every now and again.

You control the size of the cache with these flags

  --vfs-cache-max-age duration             Max age of objects in the cache. (default 1h0m0s)
  --vfs-cache-max-size SizeSuffix          Max total size of objects in the cache. (default off)

So by default rclone is keeping the files for 1 hour locally. You can set the max size also.

Rclone doesn't guarantee to keep within the limits, if you set it to 1G and upload a 5G file then it will be at 5G for a while...

Rclone doesn't guarantee to keep within the limits, if you set it to 1G and upload a 5G file then it will be at 5G for a while...

That seems to be working as expected.

I tried putting the laptop to sleep momentarily, and then waking it. Things seemed to resume without interruption. Would you anticipate any data integrity issues? I will run borg check once all is complete, however because I can't use rclone check in this instance, as there is only one extant copy of the repo, I realized I have no way of veryifying the integrity of the transfer process--merely that the final copy is readable.

@ncw From the borgbackup guys:

One thing with a fuse file system that bite a user recently is that borg sometimes needs to mmap files. So i'd check that your cloud filesystem supports that on top of every other check.

The repository layer depends on the preserved sequence of file creation in it's transactional layer. If it didn't it would have to scan all previous segments to see if something has gone missing. Which would be way to expensive.

A quick skim suggests that this is possible within the cache used by --vfs-cache-mode writes, so long as the operation is performed on a file that has not yet been streamed to google drive. (In other words, as long as the chunk is within the vis cache, it functions identically to a local, fully sane filesystem).

After the chunk has been uploaded, I'm not sure anymore, as I don't understand the way "sequence of file creation" would work on google drive once the chunk has been uploaded (does that make sense?)

I believe vfs-cache-mode full would be unnecessary in this instance, as once borg writes its chunk, it simply moves on to te next one, and never writes back to old chunks. However, perhaps it's better to use mode full just in case? (I assume I'll see an error in a log somewhere if there is an issue, though).

Your best hope would be a persistant local write back cache and a explicit option for rclone to flush everything to the external storage.

I understand this to be a description of how --vfs-cache-mode writes works, so this should be right on target.

EDIT:
...had to re-start this because my mount point was ~/Desktop/borg and I told borg to backup all of ~/

So it opened a blackhole and tried to suck the universe in to it. Oops.

Remounted /Users/Shared/borg since / and /Volumes aren't writeable in Catalina.

You might get a failed transaction if you sleep the laptop for too long. Depending on exactly what it is rclone will retry it or report an error to FUSE and hence to borg.

Using borg check sounds like a good idea.

I think rclone should support mmap for any type of mount read only. For read/write rclone will need --vfs-cache-mode writes and if the file isn't in the cache rclone will download it first.

Yes that is pretty much exactly how --vfs-cache-mode writes works.

No problems re-waking.

I did encounter an issue where rclone quit silently, however. The mount remained on the desktop, however the terminal where I ran rclone mount returned to the command prompt.

No issues resuming there either, but worth noting that -v was silent about it.

So far Iā€™m very impressed with this combination and I think your implementation of vfs is sidestepping a lot (all?) of the issues with other FUSE mounts.

However, being able to mount the drive outside of the home directory seems necessary for this to run properly.

(also on macOS if you create a mountpoint on the desktop, the Finder will display your drive twice which isnā€™t a problem with rclone, itā€™s an issue with FUSE. Nor is it an issue in most cases, but itā€™s just weird. Happens with SSHFS mounts via mac fuse as well.)

If I open a feature request is mounting to /Volumes (or otherwise outside the home directory) something youā€™d consider adding?

Because Mac users canā€™t create directories in /Volumes, we couldnā€™t supply a path to rclone mount in that fashion. I believe rclone would have to create the path at runtime and mount to it.

Cryptomator does it, as an example, but Iā€™m not clear how they are getting privileges to do so based on skimming their code...

I suspect that you'll see in the kernel log that macOS killed rclone for some reason...

macOS doesn't like fuse calls taking a very long time so you need to set --daemon-timeout bigger than the time it takes to upload the largest file.

Absolutely! Especially if you can work out how to do it! I don't have a mac so I'll be relying on you for testing

Presumably rclone won't have permission to create a directory in /Volumes either?

Thanks for that info. I've been following restic for a while and keep seeing inklings of this. I wonder if some of the problem is the core design of pack files. While it removes the issues with small files, it seems to introduce so much overhead.

I really like Borg but I do not want to be restricted to local or SSH. Duplicacy also looks promising but (a) it's not free (not a blocking issue. I don't mind paying for useful software) and (b) the name is really annoying and confusing (small issue too).

I also like that Borg is in mostly Python so I can hack around if I ever need to.

Especially if you can work out how to do it! I don't have a mac so I'll be relying on you for testing

Happy to do any testing you like (or provide an RDP connection if you would like).
I have not yet had the time to read all the docs I would need to really be of help here. I'm honestly not even sure if this would be properly addressed in rclone, or MacFUSE.

I'm a very satisfied user of Duplicacy. The CLI version is completely free (and open source).

It would be very useful if you were to please make a new issue on github about this and collect as much documentation as possible into the issue. That would be a good start!

hi all,

i'm new to the forum but have been watching the development of rclone with interest.

and seeing borgbackup as my currently favoured backup-solution i've asked myself the OPs question too and was delighted when i found this thread.

something that nobody seems to have come up with yet is whether rclone serve sftp or rclone mount is the better solution. i feel like i'm missing something obvious here but thought i might ask anyway...
possibly serve would just be another layer of indirection?

@TowerBR: as i understand the duplicacy website, even the CLI version is free only for any non-commercial use. to me this means that i cannot even use it without a license for backups of my private laptop that also includes documents belonging to the small company that i occasionally work for.

cheers,
red

Hi. Please open a new topic with your questions rather than bump old threads.