GPG encryption wrapper for backups with rclone

chiraag · April 21, 2019, 3:03pm

This is very much still a work-in-progress, but I hope this might be useful!

I recently started using Backblaze B2 as my offsite backup mechanism. Unlike their Personal Backup service, B2 isn’t client-side encrypted by default, which means the user needs to manage the encryption.

Additionally, I had a couple of requirements which made using rclone’s crypt backend unsuitable. I wanted to use GPG due to already heavily using it for my passwords (pass) and bookmarks (I wrote my own little wrapper for that).

I ended up writing a wrapper which passes the directory through tar (no compression, preserve permissions), zstd (compression setting 19), split (creating 50MB chunks, numbered), and finally gpg. Additionally, the encrypted filename is the sha1sum of the path relative to ${HOME} + a salt which is stored (encrypted, of course) in the directory that will be synced with the remote. Additionally, to make it easier to find the archive you’re looking for, it writes the path (again, relative to ${HOME}) in an encrypted file in the same directory.

#!/usr/bin/env bash

gpgopts=( "--no-encrypt-to" "--yes" "--quiet" "--compress-algo=none" )
gpgids=()
datadir="${XDG_DATA_DIR:-${HOME}/.local/share}/bkup"
cachedir="${XDG_CACHE_DIR:-${HOME}/.cache/bkup}"
dirpath=""
salt=""
backupbucket="<bucket name>"
splitsize="50M"

shopt -s nullglob
set -o pipefail

gen_salt()
{
    tr -cd 'a-f0-9' < /dev/urandom | head -c 32
}

get_checksum()
{
    sha1sum <<< "$(realpath --relative-to="${HOME}" "${datadir}")${salt}" | cut -d' ' -f1
}

make_repository()
{
    mkdir -p "${datadir}"
    mkdir -p "${cachedir}"
    if [ ! -e "${datadir}/.salt.gpg" ]
    then
	salt="$(gen_salt)"
	gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.salt.gpg" "${gpgopts[@]}" <<< "${salt}"
    else
	salt=$(gpg -d "${gpgopts[@]}" "${datadir}/.salt.gpg")
    fi
    if [ ! -e "${datadir}/.name.gpg" ]
    then
	gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.name.gpg" "${gpgopts[@]}" <<< "$(realpath --relative-to="${HOME}" "${datadir}")"
    fi
    ln -f "${datadir}/.salt.gpg" "${cachedir}/.salt.gpg"
    ln -f "${datadir}/.name.gpg" "${cachedir}/.name.gpg"
}

bkup_init()
{
    local dir="$1"
    shift
    for i in "$@"
    do
	gpgids+=("$i")
    done
    datadir="${datadir}/$(realpath --relative-to="${HOME}" "$dir")"
    cachedir="${cachedir}/$(realpath --relative-to="${HOME}" "$dir")"
    dirpath="$(realpath --relative-to="${HOME}" "$dir")"
    make_repository
}

bkup_backup()
{
    bkdate="$(date -Is)"
    ntdate="$(date -Is -d 1970-01-01)"
    filename="$(get_checksum)"
    gpgidss="${gpgids[@]/#/-r }"
    gpgoptss="${gpgopts[@]}"
    export -- gpgidss
    export -- filename
    export -- gpgoptss
    if [ -e "${datadir}/.date.gpg" ]
    then
	ntdate="$(gpg -d "${gpgopts[@]}" "${datadir}/.date.gpg")"
    fi
    mapfile -t bkfiles <<< "$(find ~/"${dirpath}" -newerct "${ntdate}" -exec realpath -s --relative-to="${HOME}" {} +)"
    if [ "${#bkfiles[@]}" -ne 0 ]
    then
	find ~/"${dirpath}" -newerct "${ntdate}" -exec realpath -z -s --relative-to="${HOME}" {} + | tar --no-recursion -cp -C "${HOME}" --null -T - | zstd -T8 -19 | split -a 6 -b "${splitsize}" -d --filter='gpg ${gpgoptss} -e ${gpgidss} -o "${FILE}".gpg' - "${cachedir}/${filename}-$(date -u "+%F-%H-%M-%S" -d "${bkdate}")".tar.zst.
    fi
    gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.date.gpg" "${gpgopts[@]}" <<< "${bkdate}"
    ln -f "${datadir}/.date.gpg" "${cachedir}/.date.gpg"
}

bkup_sync()
{
    rclone sync -P "${cachedir}" B2:"${backupbucket}/$(get_checksum)"
}

bkup_init "$@"
bkup_backup
# bkup_sync
echo "Backup done!"

As of right now, I don’t automatically run bkup_sync when running the script, preferring instead to backup multiple directories locally (in ~/.cache/bkup) and then sync them when not using my network.

An example of usage would be:
bkup .config '3DF33DB92735EDAFA847FF74EA24DF493F2BDC3C!' '906662B4055AFB85DC797614D04E3D0A14252E37!', which generates the following files and directories:

~/.local/share/bkup/.config with .salt.gpg and .name.gpg
~/.cache/bkup/.config with .salt.gpg (hard link), .name.gpg (hard link), and the actual backup (named <sha1sum>.tar.zst.nnnnnn.gpg).

Running rclone sync (or copy) on ~/.cache/bkup/.config will backup everything.

Ostensibly (I haven’t tried this), one could also directly copy to the remote server (and never create the local copy) - I decided not to do this for reliability reasons. What happens if one part of the archive fails to write properly but the others succeed? Would I have to start the backup again? It seems a lot easier to write everything locally and then transfer the files across — failure would then entail restarting the transfer rather than the backup itself.

The one thing I’ve been trying to figure out how to handle is incremental updates with this scheme. Even a small change in the directory will necessitate re-uploading most (if not all) of the files, if I’m not mistaken. The way I’ve been dealing with that is making large archives of directories I know will not change very much and making archives of the subfolders of directories I know change often, but this is very much an imperfect system and I would appreciate thoughts on this.

[edit] This new version handles incremental backups purely based on timestamp - so not entirely perfect, but good enough! It also saves the date of the last time the backup was run on that directory to .date.gpg (which could differ from the last time an incremental backup was made, since no archive is made if there are no files to include, but the timestamp is still updated). This also means, by the way, that just downloading the files .name.gpg, .salt.gpg, and .date.gpg gives you enough to reconstruct the last time the script was run on that directory, the original directory, and the hash (or the corresponding name of the folder on the cloud). Completely self-contained and reconstructable, so you can delete everything (accidentally) from ~/.local/share/bkup and ~/.cache/bkup and reconstruct from just those three files from each encrypted folder.

[final edit] I have made a repository for the script

ncw · April 23, 2019, 9:57am

A nice script - thanks for posting

You could investigate rdiff which would enable you to build binary diffs.

At this point you are re-inventing a fully fledged backup program! You might want to look at restic which solves this problem and also interoperates with rclone.

chiraag · April 23, 2019, 11:31am

The problem is that even re-encrypting the same compressed file (that hasn’t changed) will change the SHA1sum entirely. I’m planning on saving the last time the folder was backed up to an encrypted file and reading back from it to use for a find invocation (instead of the current “find all files in the directory” approach). This should get me decent incremental updates.

The thing is that restic doesn’t use GPG, which means that I would have to manage yet another set of keys and don’t get the free 2-factor auth that the GPG key on my Yubikey provides. As it currently stands, I encrypt these files with two keys. One of those is on my Yubikey and the other is a backup key that never leaves my laptop. This means that I could (theoretically) download my backup onto my phone and decrypt using my Yubikey, all without ever having my private key on my phone. On the other hand, if I ever lose my Yubikey (or it breaks), I still have access to all of my files through the backup key on my laptop - slightly less convenient, but nothing major.

But yeah, if there’s a backup program which does the following, I’d love to hear about it:

Uses GPG for encryption
Obfuscates file names and sizes (probably through a similar approach of creating a tarball)
Ideally compresses (B2 as the remote) - this one’s a bit optional since B2 is so cheap!

So far I’ve found restic, duplicity, rclone crypt which do one or two of these things, but never everything together (the biggest one is that they usually don’t use GPG, and from what I understand, the crypt backend doesn’t hide file sizes, just their names?).

[edit] Actually, hmm…duplicity might fit the bill! Seems to use GPG and produces encrypted tar archives. Cool!

[edit2] On further inspection, I’d probably need to do a lot of post-processing, since I’d like to obscure the filename of the backup (hence I’d need to read in and keep track of the salt and name files). It’s probably still doable, but…might just rely on my script for now

Thanks for pointing me in the direction of duplicity, though, even if indirectly!

ncw · April 23, 2019, 4:42pm

You might want to use an rclone mount for the target of duplicity, or make the backup files and rclone them to the provider.

chiraag · April 24, 2019, 7:25am

That’s effectively what I’m doing right now with my script (only really feasible because I just upgraded from a 1TB internal to a 2TB internal…at some point this drive’s going to fill up and I’ll have to start directly streaming backups to them via rclone, but hopefully that will take a while to happen since incremental updates should be super quick and painless).
My workflow is basically:

Run bkup to generate a bunch of files to upload.
Maybe upload a bunch of them at home, where my internet is shit.
Save all the large uploads for school and have rclone scream through them (with --transfers=10, I was able to upload 49GB in a little over a half hour…that would have taken me more than a day in my apartment).
???
Profit!

system · July 23, 2019, 7:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.