This is very much still a work-in-progress, but I hope this might be useful!
I recently started using Backblaze B2 as my offsite backup mechanism. Unlike their Personal Backup service, B2 isn’t client-side encrypted by default, which means the user needs to manage the encryption.
Additionally, I had a couple of requirements which made using rclone’s crypt
backend unsuitable. I wanted to use GPG due to already heavily using it for my passwords (pass
) and bookmarks (I wrote my own little wrapper for that).
I ended up writing a wrapper which passes the directory through tar
(no compression, preserve permissions), zstd
(compression setting 19), split
(creating 50MB chunks, numbered), and finally gpg
. Additionally, the encrypted filename is the sha1sum
of the path relative to ${HOME}
+ a salt which is stored (encrypted, of course) in the directory that will be synced with the remote. Additionally, to make it easier to find the archive you’re looking for, it writes the path (again, relative to ${HOME}
) in an encrypted file in the same directory.
#!/usr/bin/env bash
gpgopts=( "--no-encrypt-to" "--yes" "--quiet" "--compress-algo=none" )
gpgids=()
datadir="${XDG_DATA_DIR:-${HOME}/.local/share}/bkup"
cachedir="${XDG_CACHE_DIR:-${HOME}/.cache/bkup}"
dirpath=""
salt=""
backupbucket="<bucket name>"
splitsize="50M"
shopt -s nullglob
set -o pipefail
gen_salt()
{
tr -cd 'a-f0-9' < /dev/urandom | head -c 32
}
get_checksum()
{
sha1sum <<< "$(realpath --relative-to="${HOME}" "${datadir}")${salt}" | cut -d' ' -f1
}
make_repository()
{
mkdir -p "${datadir}"
mkdir -p "${cachedir}"
if [ ! -e "${datadir}/.salt.gpg" ]
then
salt="$(gen_salt)"
gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.salt.gpg" "${gpgopts[@]}" <<< "${salt}"
else
salt=$(gpg -d "${gpgopts[@]}" "${datadir}/.salt.gpg")
fi
if [ ! -e "${datadir}/.name.gpg" ]
then
gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.name.gpg" "${gpgopts[@]}" <<< "$(realpath --relative-to="${HOME}" "${datadir}")"
fi
ln -f "${datadir}/.salt.gpg" "${cachedir}/.salt.gpg"
ln -f "${datadir}/.name.gpg" "${cachedir}/.name.gpg"
}
bkup_init()
{
local dir="$1"
shift
for i in "$@"
do
gpgids+=("$i")
done
datadir="${datadir}/$(realpath --relative-to="${HOME}" "$dir")"
cachedir="${cachedir}/$(realpath --relative-to="${HOME}" "$dir")"
dirpath="$(realpath --relative-to="${HOME}" "$dir")"
make_repository
}
bkup_backup()
{
bkdate="$(date -Is)"
ntdate="$(date -Is -d 1970-01-01)"
filename="$(get_checksum)"
gpgidss="${gpgids[@]/#/-r }"
gpgoptss="${gpgopts[@]}"
export -- gpgidss
export -- filename
export -- gpgoptss
if [ -e "${datadir}/.date.gpg" ]
then
ntdate="$(gpg -d "${gpgopts[@]}" "${datadir}/.date.gpg")"
fi
mapfile -t bkfiles <<< "$(find ~/"${dirpath}" -newerct "${ntdate}" -exec realpath -s --relative-to="${HOME}" {} +)"
if [ "${#bkfiles[@]}" -ne 0 ]
then
find ~/"${dirpath}" -newerct "${ntdate}" -exec realpath -z -s --relative-to="${HOME}" {} + | tar --no-recursion -cp -C "${HOME}" --null -T - | zstd -T8 -19 | split -a 6 -b "${splitsize}" -d --filter='gpg ${gpgoptss} -e ${gpgidss} -o "${FILE}".gpg' - "${cachedir}/${filename}-$(date -u "+%F-%H-%M-%S" -d "${bkdate}")".tar.zst.
fi
gpg -e "${gpgids[@]/#/-r }" -o "${datadir}/.date.gpg" "${gpgopts[@]}" <<< "${bkdate}"
ln -f "${datadir}/.date.gpg" "${cachedir}/.date.gpg"
}
bkup_sync()
{
rclone sync -P "${cachedir}" B2:"${backupbucket}/$(get_checksum)"
}
bkup_init "$@"
bkup_backup
# bkup_sync
echo "Backup done!"
As of right now, I don’t automatically run bkup_sync
when running the script, preferring instead to backup multiple directories locally (in ~/.cache/bkup
) and then sync them when not using my network.
An example of usage would be:
bkup .config '3DF33DB92735EDAFA847FF74EA24DF493F2BDC3C!' '906662B4055AFB85DC797614D04E3D0A14252E37!'
, which generates the following files and directories:
-
~/.local/share/bkup/.config
with.salt.gpg
and.name.gpg
-
~/.cache/bkup/.config
with.salt.gpg
(hard link),.name.gpg
(hard link), and the actual backup (named<sha1sum>.tar.zst.nnnnnn.gpg
).
Running rclone sync
(or copy
) on ~/.cache/bkup/.config
will backup everything.
Ostensibly (I haven’t tried this), one could also directly copy to the remote server (and never create the local copy) - I decided not to do this for reliability reasons. What happens if one part of the archive fails to write properly but the others succeed? Would I have to start the backup again? It seems a lot easier to write everything locally and then transfer the files across — failure would then entail restarting the transfer rather than the backup itself.
The one thing I’ve been trying to figure out how to handle is incremental updates with this scheme. Even a small change in the directory will necessitate re-uploading most (if not all) of the files, if I’m not mistaken. The way I’ve been dealing with that is making large archives of directories I know will not change very much and making archives of the subfolders of directories I know change often, but this is very much an imperfect system and I would appreciate thoughts on this.
[edit] This new version handles incremental backups purely based on timestamp - so not entirely perfect, but good enough! It also saves the date of the last time the backup was run on that directory to .date.gpg
(which could differ from the last time an incremental backup was made, since no archive is made if there are no files to include, but the timestamp is still updated). This also means, by the way, that just downloading the files .name.gpg
, .salt.gpg
, and .date.gpg
gives you enough to reconstruct the last time the script was run on that directory, the original directory, and the hash (or the corresponding name of the folder on the cloud). Completely self-contained and reconstructable, so you can delete everything (accidentally) from ~/.local/share/bkup
and ~/.cache/bkup
and reconstruct from just those three files from each encrypted folder.
[final edit] I have made a repository for the script