Rclone as destination for Borgbackup

at long as it chunks to files that fine :smiley:
But again, i've never used borg, so someone else will have to chime in on the finer details here...

Borg does work with rclone mount however not perfectly: see https://github.com/rclone/rclone/issues/3641

I haven't managed to replicate the problem there yet.

You can do a borgbackup to local disk then use rclone copy to sync it to google.

You could also use (for example) restic which can use rclone as a backend to backup to drive.

It's not terribly hard to set up a script that just transfers the data once borg has finished working with it locally though - if that turns out to be needed.

@ncw I think what we really need to solve a whole lot of these problems is a delayed upload timer flag for the VFS cache. I would also add a modtime test to that so recently modded files are delayed.

Because currently there are a lot of apps that have various problems with working temporary files and the like - due to VFScache being too aggressive uploading them immediately.

I've got that on the list :slight_smile: But you are right it will probably solve a lot of these problems.

I think that will probably come for free as in that that is what the VFS will be measuring.

The only list longer than Krampus's naughty-list :wink:

1 Like

Hello everyone, and specifically @ncw,

I strongly recommend against using restic right now, at least to anyone trying to back up a large number of bytes and a large number of files (I have ~60M files here, occupying ~32TB of space on the source file systems being backed up), because:

  • restic backup uses a ton of memory when you have a large number of files (to the point of 64GB -- yes, you read that correctly: 64 gigabytes -- of RAM not being enough).
  • restic prune is mandatory: if you let your restic repo run too large (eg, a month of daily backups, with just ~1M files and ~50GB changing on the source), your remote repo will get corrupted (ask me how I know... :expressionless: ).
  • restic prune currently takes a ton of time and uses a ton of memory: even with on an 8-vCPU + 128GB RAM Google Compute node I created specifically for the prune, after 2 full days running, it aborted with OOM... :expressionless:
  • restic development is pretty much stuck right now: I'm not complaining (on the contrary, I'm grateful to fd0 and the other restic developers for all the time they spent on the project) but it's simply not moving forward; for example, a patch to make restic prune minimally workable in large-repo situations is stuck for over 2 months now, except for people reporting repo corruption issues. Another patch to likewise make restic restore workable for large repos is stuck for almost as long, despite lots of reports (including mine) that it's working perfectly and it only needs to be approved for merging.

So, despite having invested literally months trying to get restic working on my setup, I'm sadly being forced to give up on it :expressionless:, and I'm moving to borg (BTW, that's how I found this topic).

I think you meant rclone sync, right? Otherwise, lots of useless gunk would remain on the rclone remote after each borg prune.

Anyway, that's exactly how I'm proceeding here: I finished setting up borg last night and since then I'm running a borg create to back that 62M files / 32T Bytes source to a local Borg repo, and simultaneously I'm running rclone sync on a loop copying it to Google Drive, something like this:

while ! rclone -v --checkers=12 --transfers=2 \
               --multi-thread-cutoff=1k --multi-thread-streams=8 \
               --low-level-retries=1000  --rc \
               sync LOCAL_BORG_REPO GDRIVE:BORG_REPO; 
do 
    sleep 5; echo === `date`; 
done

So, it just runs one rclone sync after the other, until one of them finishes successfully; not exactly what I wanted, which would be to stop the loop after an rclone sync runs without copying anything (because the local repo was not modified, ie, the parallel borg create command has finished.

How could one go about checking that, without having to parse rclone's output? Perhaps a specific exit code (enabled by a separate option like '--special-exit-if-nothing-copied' so as not to break compatibility with current scripts)?

Thanks,
-- Durval.

Yes that would be better

You could use the --rc to read the transfer stats.

How about running rclone with rclone rcd instead and scripting the syncs with rclone rc - that could be a possibility.

That is not a bad idea... So a special exit code if no transfers were made. Maybe --error-if-no-transfers?

Hello @ncw,

Perfect name for the option! :+1: Should I open an issue for this enhancement on github?

Cheers,
-- Durval.

Why are you running rclone sync on a loop instead of just letting borg create finish and then syncing once?

I have limited local space, so I am considering using this option, but having as much local space for a borg repo as I do for local storage isn't really an option on my portable devices (as I think it probably isn't for most people).

Howdy @dr.mcgillicuddy,

The plan is to sync the borg repo constantly all the time to the rclone remote, so when borg create finishes, a final quick rclone sync will be enough to copy the remaining differences. In other words, I don't have to wait until borg create finishes and then wait for the full rclone sync to run; when borg create is done, most of the data will be copied already.

I feel your pain. Unfortunately, my method of simultaneously running borg create and a sequence of rclone syncs until the former is finished doesn't do anything to help you reduce the storage requirements :frowning:

For you, currently the only option is to try and run borg directly onto anrclone mount -- but as @ncw has clarified, this is not working perfectly yet.

@ncw, what would it take to get a special rclone server mode for borg so as to obtain as seamless an integration as in restic? That would be a really nice feature to have....

Cheers,
-- Durval.

I haven't worked out why this is...

Any insights most appreciated!

I think borg supports ssh, but it is expecting to run a borg binary at the other end. So rclone would have to re-implement borg serve

I asked a question on a borg issue about this: https://github.com/borgbackup/borg/issues/1070#issuecomment-553627730

If I understand their docs correctly it will need to perform a fair number of read/write ops as part of its dedupe process. To work properly with rclone, that would mean rclone has to basically pull your whole repo every time borg runs.

In other words, because borg doesn’t just check an index, but rather hashes and compares chunks (and those chunks may or may not line up with rclone’s chunks) the overhead would be tremendous on any file system that didn’t allow random access.

Or am I way off base here.

I don't know enough about borg to say one way or another! I would have hope that borg builds a local cache of files and hashes because hashing everything will be slow whatever the storage.

I use borg backup and rclone on my unraid server. I use a script to handle this, a slightly modified version of this script. It works surprisingly well, though it does require you to store the borg backup files locally.

Hello Nick,

You are right on both counts.

Great! Just went there and posted my support for your request, let's hope the borg developers are willing to help.

Cheers,
-- Durval.

1 Like

Hello Nick, @dr.mcgillicuddy,

Yes, borg-backup maintains a local cache of the directory, including indexes. I know this because I'm monitoring it right now while it's backing up my production server.

Not sure about caching hashes... but I would think it does (no sense caching a dedup'ing repo without also caching the hashes of the deduped data).

Cheers,
-- Durval.

1 Like

@durval
So in practice, deduping the repo would not be especially transport-intensive, as long as rclone was "smart" about which chunks to grab? Should I set rclone's chunk size and borg's chunk size to be the same?

I intend to spin this up on a google compute micro instance over the weekend and just see what happens. Any hurdles that come to mind to put on the checklist would be appreciated.

Arq manages to do basically this task, so I'm confident it is possible to interact with the API in such a way as to keep a deduced, incremental backup on my gdrive. Arq is, unfortunately, too limited. Borg, on the other hand, would let me do a local backup, a NAS backup, and a cloud backup all with one tool, so it is far superior (and kudos for a fantastic utility @durval).

My next thought is that using borg mount on top of rclone mount sounds like it's begging for trouble, but with proper hashing I guess it should be fine, and that is like step 4, not step 1.

Hello @dr.mcgillicuddy,

Please be aware that I'm not an expert on borg, I'm still literally waiting for my first backup to finish.

With that caveat, I will try to answer yours questions the best I can:

I'm not sure what you mean by "deduping the repo" -- if you mean the borg repo, it's already deduped 'before' rclone sees it.

You mean, a micro instance running borg in server mode, and writing to storage attached to instance, and then using rclone to copy it over to the cloud? That would certainly work. You can even run rclone in a while loop so as you don't need to wait for borg to finish to start copying, as I did above.

Be sure you have enough RAM for borg to operate on your dataset; so far for me borg is using a lot less than I've seen with restic, but nevertheless memory is the main resource used for deduping, and it grows linearly with the number of files and their size. More info here: https://borgbackup.readthedocs.io/en/1.1.10/internals/data-structures.html#indexes-caches-memory-usage

I'm not familiar with Arq, but regarding Borg please note that it only does the first two right now ("local backup" and "NAS backup", and the latter presuming you NAS is either running borg in server mode and being accessed via SSH from the client, or exporting a network -- NFS, SMB, etc -- share to the client, where borg would handle it just like local storage). To do "cloud storage" with borg right now, you'd either have to try and run it writing to an rclone mount mountpoint (with the caveat @ncw noted above), or set up a VM instance (like the micro instance you mentioned above) to run borg in server mode with the "cloud storage" either attached to it (currently not possible with Google Drive storage AFAIK, so you'd have to use Google Cloud storage which is expensive in the long run), or writing to attached instance storage and then copying from there to some "cloud" like Google Drive using rclone separately. '

(and kudos for a fantastic utility @durval).

Thanks, but please note that I didn't write borg nor rclone nor any other "utilities" mentioned here :wink: If you mean rclone, then the guy to thank for is @ncw.

As @ncw mentioned near the top of this thread, people trying to do that are having trouble, and he was so far unable to diagnose, much less fix it. But if you can afford the time for the testing/debugging, please set it up and then contact @ncw with the issues so he can ask you for more details/tests in order to eventually fix this.

Cheers,
-- Durval.

Thanks, but please note that I didn't write borg nor rclone nor any other "utilities" mentioned here :wink: If you mean rclone, then the guy to thank for is @ncw.

Apologies, I was under the impression you were involved in borg development. But obviously, many many thanks to @ncw for his amazing work, and involvement with the community.

run borg in server mode with the "cloud storage" either attached to it (currently not possible with Google Drive storage AFAIK, so you'd have to use Google Cloud storage which is expensive in the long run), or writing to attached instance storage and then copying from there to some "cloud" like Google Drive using rclone separately. '

My intention was to do this with an rclone mount, but should that prove impossible, I would use a locally mounted storage for the initial repo, move that repo to google drive, and attempt to merely run the incremental backups against rclone mount.

As @ncw mentioned near the top of this thread, people trying to do that are having trouble, and he was so far unable to diagnose, much less fix it.

That is not my understanding of his post. My understanding of his post was just that there was some issue using borg with rclone mount; it seems that even if the original poster correctly identified a reproduce-able issue, one solution might just be single threaded operation of both borg and rclone.

My comment, however, was referring to the ability to use the borg mount command to view the contents of the borg repo itself, not to simply add the remote filesystem upon which the repo is stored.

@ncw In the thread that was referenced earlier (https://github.com/rclone/rclone/issues/3641) it appears you were never able to reproduce the original poster's issue?

I haven't reproduced it yet... I ran out of disk space when running the test and got distracted by something else :frowning: