Best way to copy millions of files (Arq Backup) from ACD to Google Drive?

hovee · June 11, 2017, 7:02pm

I’m trying to copy my Arq backup folder https://www.arqbackup.com/ from Amazon Cloud Drive to Google Drive and I’m trying to figure out the best way to do it. Currently I have a VPS in Google Cloud Compute copying the files, but it is taking quite some time. I have over 3 million files that are in the Arq directory. They are all very small files around 1mb to 5mb or so. Are there any flags I should be adding or changing to speed things up?

Example of files:

This is the command I’m using now in the Google Cloud Computer VM. The VM is 2 vCPUs, 7.5 GB memory running Ubuntu.

rclone copy --transfers=50 --buffer-size=1000M --no-traverse --checkers=50 -v amazon:"Arq Backup Data" googleapps:"Arq Backup Data"

poom · June 12, 2017, 3:24am

I use GoodSync for Linux and make TIX files from windows version (My Arq Backup around ~500GB)

SirCrest · June 12, 2017, 6:43am

I’m interested if you get this to work. I spoke to Arq Support and they said this won’t work. Or rather “they don’t support it.” So I don’t imagine you can simply attach the backups to Arq.

hovee · June 12, 2017, 1:42pm

Theoretically, it should work. Once all the files are copied over to google drive, you should be able to add google drive to the Arq application. You will see your backup under the google drive Restore Files list and you should be able to click “adopt this backup set”. It should scan the files and compare against local files and then pick up where it last left off.

Stokkes · June 12, 2017, 2:14pm

Contact Arq because I do not think this will work. I’ve asked them this before (moving the raw files from one provider to another provider and I’m quite certain they said it won’t work).

fredv · June 13, 2017, 11:25am

Just test it with a small backup.

I have also asked and they told me that there may be different implementations of Arq on different cloud providers but they told me to try and they would try to help me if there are issues.

I have tested a local Arq backup, uploaded to Amazon Cloud Drive by the web interface and it works, Arq backup sees it correctly (and restores it).
Next step is to try to copy this one to google drive but i’m pretty sure it would work as the files are in a simple format and it works if you upload local files (that may be different with S3 ou Glacier maybe)

fredv · June 13, 2017, 12:12pm

I have just tried to copy from Amazon to GDrive now and it worked correctly thanks to your command line. It will probably be long but Arq is ok about that.

My test :

take a 1GB folder
Arq Backup to Local folder
Upload files by Amazon Cloud Drive web page (~700 files)
Connect Arq to ACD : it sees the backup and can restore it
Rclone from ACD to GDrive with the command given by the author of this post
Connect Arq to GDrive : it sees the backup and can restore it

SirCrest · June 14, 2017, 5:49am

Well then I might try it. I have mine stored in an appdata folder, hopefully that will work ok, or might just delete all my googledrive backups and start fresh.

fredv · June 14, 2017, 6:39am

On a 1GB backup it worked correctly but on a 1TB backup, when I do the copy between google and amazon, some objects are declared duplicates and some others don’t work … I don’t think the result will be correct … maybe the best way would be to download everything and then upload everything …
well sorry to say that but maybe Arq has specific behaviour to avoid duplicates error when uploading to the cloud …I will try to finish the copy and see if it works but it may not be good …

uuu · June 14, 2017, 8:52am

Actually I would expect Arq to be more up front with the incompatibilities between targets, so I was very frustrated to learn the hard way (moving terabytes of data)

SirCrest · June 18, 2017, 1:28am

Since I’m in the filestream beta, I’m using that to upload since it seems to handle API limits differently. I have a GCC compute VM. 8GB ram, 2cpus. Then I have a 1.5TB drive, then use a symbolic link to store the GoogleFS cache. Then I basically just download the arq files at a steady rate into the “virtual” drive FileStream makes.

If you run it too fast, it can crash the client. So I’m using 8 transfers. On my local machine I did 64transfers but used 100kb max size to handle all the small files first. Did well for a while. I unlinked Arq from my google drive, and deleted all arq data. So that I’m doing this transfer fresh, then I’ll connect when all of them are done.