Parallel Migrations - API Service Account and rate limits

Hello All,

this is a planning question. I am preparing to run a migration with multiple servers moving multiple on prem drives to Google Shared drives.

I will be creating Service accounts to run these migrations and will have domain delegation to use different accounts to create the files on the Target shared drives.
e.g.
--drive-impersonate file_admin1@yyy.com
--drive-impersonate file_admin2@yyy.com

Quotas show
Google Drive API Queries per 100 seconds 20,000
Google Drive API Queries per 100 seconds per user 20,000
Google Drive API Queries per day 1,000,000,000

I am trying to understand the critical path, is it going to be the Drive api calls rclone Service AC. or the number of user accounts I specify with --drive-impersonate?

Scenario
If I have 10 servers, and each server has 6 - 8 drives to be copied to individual 6 - 8 matching Google Drives (per server).

I am thinking of running all 10 servers over the same period. but doing each drive in serial.
rclone running 4 processes per Server x 10

Would you match Service Accounts to servers? or only a couple of SA's

10 file_admin's (less or more?)

My thinking is less SA's and at least file_admins / server (so 10 users copying files.) but that is just my first though.

Note: the goal is to estimate the requirements and time for the Migration window. I have looked at the time to do one server of differnet sizes.

Anyone done anything similar?

Thanks in advance.

Eric White.

Rclone will keep within the 20,000 queries per 100 seconds itself due to its own rate limiting. None of these 3 limits will be a problem in practice.

The problem you will hit is the 750 GB/Day upload limit. This is per user or service account I believe.

You can work around that in various ways, but using --bwlimit 8M is the easiest. Using more service accounts will help here - to maximise this you'll need 1 SA per running rclone. Note 8MiB/s is about 64 Mbit/s - how many rclone's you run needs to take into account your upload bandwidth.

Note also that there is a number of files limit on shared drives

A shared drive can contain a maximum of 400,000 items, including files, folders, and shortcuts.

So that is worth bearing in mind too.

Thank you Nick,

good points. We are doing precheck for 400k objects and 20 level deep on folders.

I had not factored in the 750 GB as they are separate sites but all part of the same primary domain.

Do you know if that is domain centric, as in if I created multiple GCP projects so that the api key and SA were in different projects would that be per project? each site has 8 drives but in the order of 100 - 200 GB But yes 10 sites would be well over 750GB.

More to consider.

I think the limit will be 750 GB/day per SA so you probably just need more SA rather than creating a new domain. So one SA per site sounds like it would work if I'm understanding you correctly.

Thank you.

I think we agree on an approach. I am going to write it up and pass it up to a Google rep, and get some input from their side.

Summary of Plan:

One GCP Project
10 rclone Service Accounts with Domain delegation
file_admin[1-5] (not clear if it will be important to distribute the file creation work load.)

With my drive there were issues with creating all the files with one account. I dont believe that is an issue with Shared Drives, but I think that is a Google question to confirm.

Thanks for all the suggestions.

note also that the single server testing on 200 to 300 GB of data ran at about 8GB - 10GB per hour. without any bwlimits. and lower with a server with lots of small files.

I have not done any parameter tuning on the number of threads etc. (I did some testing with more threads but that didn't seem to have much impact)

The Drive API sucks at small files and creating them will be slow as you only get 2-3 per second.

Can I just say to Nick and the rclone team.

Congratulations, rclone is such a super-strong product. I have mixed it in with Apache Airflow to make a killer migration suite.

Thanks heaps.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.