Backup dropbox data directly to S3 immutable

Hello everyone,

I'm currently trying to setup a backup solution to save my dropbox data to AWS S3, in an immutable way, and I have a few interrogations being a complete beginner when it comes to rclone. I have read a lot of posts so I have an idea of a possible solution but I would gladly hear the opinion of rclone experts.

I first wanted to use something like restic (to save space with deduplication) but I am afraid of repository corruption.

I plan to have with rclone 5 remotes :

  • dropbox
  • awss3 (storage_class : STANDARD)
  • awss3encrypted (type crypt, remote = awss3:/encrypted)
  • awss3da (storage_class : DEEP_ARCHIVE)
  • awss3daencrypted (type crypt, remote = awss3da:/encrypted)

I plan to have on AWS S3 awss3encrypted :

  • 6 daily incremental (I read that --backup-dir can help with that) : with a 1 week retention
  • 3 weekly full backups : with a 1 month retention

I also plan to have on AWS Deep Glacier awss3daencrypted (as have 180 days minimum retention, I can't use it for daily/weekly backups) :

  • 11 monthly full backups : with a 1 year retention
  • 5 yearly full backups : with a 5 year retention

And I plan to setup IAM roles that allow only write/read access (no deletion which would be done by retention policies).
I read that I can have immutable data with the --immutable flag.

So basically I plan to run the following commands :
full backup on monday : rclone copy dropbox: awss3encrypted:/full/20211026 --immutable --progress

incremental backup tuesday-sunday : rclone sync dropbox: awss3encrypted:/full/20211026 --backup-dir=awss3encrypted:/incremental/20211027 --immutable --progress

And with another cron for the monthly/yearly deep glacier backups :
rclone copy dropbox: awss3daencrypted:/monthly/20211101 --immutable --progress : every month
rclone copy dropbox: awss3daencrypted:/yearly/20220101 --immutable --progress : every year
I am not sure if the copy works for deep_archive ?

To save on the number of file transfers on the S3 side, is there a way to tar the dropbox data during the operation ? Another solution would be to first copy data on the AWS Fargate container, tar the files then copy it to S3 but I was really interested to do the backup in one command.

With that strategy, I would be using a lot of storage with that amount of full backups (despite deep glacier being cheap).

Due to how glacier works, I guess I could not do the same kind of incremental backup as what I plan to do for the daily/weekly backups.

I am wondering if something like that could work :
rclone copy dropbox: awss3daencrypted:/monthly/20211101 --immutable --progress : the first month of the year a full backup directly to S3 standard instead of glacier (with still a one year retention).
Then at the beginning of each of the next 11 months, I'd do an incremental sync with a transition policy to be moved to deep_glacier after one year (as I guess I cannot sync directly to deep_archive ?).
rclone sync dropbox: awss3daencrypted:/monthly/20211101 --backup-dir=awss3daencrypted:/monthly-incremental/20211201

But I am really worry of data corruption. With the backup-dir flag, has there already been cases of data corruption ? And in the case it happens, what could be done to fix it (just do another incremental again) ?

I saw @asdffdsa mentionned he used veeam to create the backups. In my case, I'd like to execute the backup directly from the cloud as data are pulled directly from dropbox (maybe in AWS Fargate) so I would not retain the veeam backup chain, that's why I was leaning towards rclone which seems to be able to handle the cloud to cloud transfer without having to have the data on disk (just in memory).

What would be the advantage of using Veeam here instead of the rclone incremental. Or in case of pure full monthly/yearly backups, is there a point at all using something like Veeam or all its power shines only when you plan to setup things like forever forward incremental / reverse incremental ? Here I'm kind of trying to do something like a GFS backup but managing myself the sync to cloud part.

One more thing, in the case of incremental backup with the backup-dir flag, what does the restore process would look like ? Do I have to download the last full backup + all the incremental backup (till the day needed to restore), then manually copy all incremental to the full, or maybe something else ?

Anyway thanks for creating such a great tool !

if the source is dropbox and the dest is aws, then veeam is not a solution.

how much data do you have in dropbox?
how much data do you add to dropbox per day on average.

aws does not charge for ingress so you can rent a cheap virtual machine from aws and tar the files inside that and push them to aws s3.
if you search the forum, there are some good topics about creating tar files using rclone rcat
or you can use a rclone mount and command line linux tools like tar.

Thanks for your answer @asdffdsa !
But you mean I can't copy directly from dropbox to aws S3, and that commands like rclone copy dropbox: awss3encrypted:/full/20211026 --immutable --progress would fail ?
I don't have a lot of data 20GB but lots of files (200k ish). Changes per day are really small (100-1000).
I'd rather not have a permanent vps doing the backup, I'd prefer a cron scheduler that start a container (in fargate) doing the backup.
I could even live with doing full backups everyday if it simplifies the restore process (as it seems there is no native rclone function that can pull the incremental directly from the remote to restore a specific day).
The tar part is not the most crucial either, I'm more worry about being rate limited by dropbox (due to the number of files, that is why I asked @ncw about the eventual possibility to integrate the download_zip that dropbox api exposes : Zip folder before download (Dropbox)).
Also could you please further explain why you are using Veeam forever forward incremental instead of the rclone incremental (via backup-dir) ? As I have another dataset for which I could configure a permanent local backup.

that will work

veeam is not an option for cloud to cloud.
it is for bare metal backup for windows machines.
it can be used for backing up files but i never used that feature.

If you have an immutable object store, then --backup-dir will not be a good option since it works by moving files to be overwritten or deleted to the destination. Instead, you will wan to use something like --compare-dest on the root and then sync to new directories. This will be hard for figuring out what has been deleted between backups but would provide a way to only copy new/modified files to a new directory.

Or, I am misunderstading your question.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.