My scenario has about 25TB of unencrypted data on a Google Drive, that needs to be copied to a crypt based Google Drive remote.
Most are large files (> 100 MB), so I'm not expecting to be hitting API requests limits, but only the daily 750GB limit.
Since I have no experience with Google's cloud platform, I'm throwing this out there for suggestions on what would be the cheapest VM configuration to achieve that.
Maybe a fast VM that would reach the 750GB limit in one hour (and then shuts off for the day if that can be automated) comes out cheaper than a slow, memory-starved configuration that runs 24/7.
Has anybody researched this?
Well after a quick test, tt seems that the absolute cheapest is lterally free.
An f1-micro instance (free-tier, forever supposedly) is only 0.2 vCPU and 0.6GB RAM.
I launched a debian 9 instance, and managed about 78MB/sec on default rclone settings.
This is interesting. Do I understand correctly that you would need to download your data from your unencrypted "Google Drive" to a "Google Cloud Storage" that is connected to your VM instance, and then upload from "Google Cloud Storage" back to encrypted remote on Google Drive? Or, are you able to move directly from unencrypted folder to encrypted folder on Google Drive without the need for "Google Cloud Storage"?
I am also wondering what egress charges apply in either case.
No I don't use Google Cloud Storage. I just selected the default 10 GB for the VM system disk (reported as Standard persistent disk) for the OS installation. No either storage is attached/mounted to the VM.
I rclone copy from remote (unencrypted) to remote (encrypted)
From what the documentation states, there is no ingress/egress charges in this case (drive <--> compute).
I made a rclone script to copy until it reaches a predetermined max-transfer size, and made a cron job to launch the script once a day. Without a max-transfer size, I guess I would reach the daily 750GB limit in about 3-5 hours.
Also, since the instance is free-tier, there is no need to automate further. I can leave it on permanently.
Are you planning on running check (cryptcheck) once you've finished copying the data? I'm in the same scenario, using an f1-micro instance on Google
I wasn't until you mentioned it, because this isn't critical data in my case
But I guess it would be an interesting experiment since the VM is free and all. And because it's literally sitting there doing nothing for the majority of the time.
Now, I don't think the best way to approach this is to check once I've finished copying all the data (50 days from now).
It would be better, I think, if we were able to check incrementally after each batch of uploading.
In my case I have a cron script that does something like:
rclone copy unecrypted:path encrypted:path --max-transfer 500G --cutoff-mode=soft
Once it reaches the 500GB target the script throws an error and exits, and then it starts again 24 hours later.
So the question is, how can I complement this, to
cryptcheck the encrypted remote, but also keep a list of files that have already been checked, so that the check can skip them. Is there a flag that can help, or can we put together a simple bash script that will help us get there?
Sorry for the late reply, the low tech option would be to manually split your data into folders. Copy and then check each folder sequentially. As long as each folder doesn't exceed 10TB there shouldn't be any problems. 10TB as that's the daily download limit on Google Drive and cryptcheck will involve downloading the source file, encrypting and checking the resulting checksum with the destination's checksum
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.