So I have a use case, where I need to zip certain folders on certain Google Shared Drives [ORIG], then put the resulting zip file in another destination Shared Drive [DEST].
Since there are different ways to do this with rclone, I'd like to get some feedback on what you think is the most reasonable safety/performance option. I see 3 ways:
Do a rclone copy from ORIG to a local disk, compress the folder locally, rclone copy the zips to DEST. Easy, but it will need lots of local disk space. Can be done in a cloud VM though.
Do a rclone mount of ORIG and compress the folder there, so the resulting zip file is already there. Then move it to DEST from drive web interface. Not sure if this could cause bottlenecks. It is possible also that I hit upload/download quota, and not sure if rclone mount+zip will handle this well. I know rclone copy does it well.
Do a rclone mount of both ORIG and DEST, an zip from ORIG with destination to DEST. This could ease up bottlenecks, but not sure if it will be appropiate.
Does anyone have some hints? It is difficult for me to do pre-benchmarks of all this, since the use cases vary greatly. Sometimes it is literally 200k files totally a dozen TBs. Sometimes it is few files but very big ones. And so on.
For a zip, all bytes have to be read client-side anyway, so mounting both drives does not really avoid the expensive part. I would not write the zip back into the mounted source either; with lots of small files that is usually the most fragile option. If you can, run this on a VM close to Google, write archives to local/attached disk first, then use rclone copy to send the finished zips to DEST. For dozen-TB cases I would also split per top-level folder or date range, so one failure does not mean rebuilding one huge zip.
Actually, I think I will use rar with splitting in smaller .part files. It is really not feasible for me to micromanage the archiving of all those sub-folders. This is for an organization, not for personal use.
If you go with rar split parts on a VM, I would still write the archive to local/attached scratch disk first, then use `rclone copy` for the finished parts to DEST. Writing archive parts directly through a mount can make retries and partial files more annoying if the mount stalls. I would also do one small folder end-to-end and test extraction before starting the large set. For Shared Drives, clear names plus a small text log of the source folder next to the archive can save a lot of guessing if one chunk has to be rerun.
a major possible advantage of using mount is that after each part is written to the mount, rclone can upload/move that part in real-time. so the total local storage requirements would be greatly reduced.
and if 7zip needs to access the headers/footers of the individual parts, using --vfs-cache-mode=full would greatly reduce the total local space required. tho i would treat certain --vfs flags to reduce the size of each chunk that is download from gdrive.
tho, all that needs to be tested using a debug log
Yes. That's what I thought. However, if it is going to be more reliable with copy, I can live with having to provision a larger storage. It is not for personal use and the advantage of cloud VMs is that you can delete/recreate the disk space easily. It is not going to be running 24/7, only when these type of archiving operations need to be done.
mount is most reliable. with copy and mount, if a file fails to upload, rclone will retry.
for 7+ years, i have been using a cheap hetzner cloud vm and hetzner storagebox.
i have yet to found a better deal then that.
with that combo, can easily and cheaply 7zip files.
the important thing is that the cloud vm can mount the storagebox as local storage.
do not pay for ingress, egress and soon on.
i use it also as a backup repository for veeam backup and replication.