Syncing up to Google Drive, speed with lots of files

Hello, I'm syncing a local drive to Google Drive. The drive has about 800 gb of data and 700,000 files. Initially the drive copied to GDrive relatively fast, in about 3 days, even using the shared client id.

When I do subsequent syncs it takes up to 4 hours for small transfers of around 2 gb. I got my own client id and it's still very slow.

I notice that the rate is being limited by query volume. It appears that RClone is doing lots of dir queries that go over Google's allowable query per second rate.

Gsutil rsync to Google Cloud Storage is over 10x faster than RClone for me. I know GDrive and Cloud Storage are different entities, but I notice that gsutil makes a list of all files to be transferred before it starts transferring, and RClone doesn't do this as far as I can tell.

Is there a possibility that RClone making an initial transfer list and then batch transferring would be faster? Is there a way to tell RClone to do this currently? If there isn't I wouldn't be adverse to hacking in some Go code to add this feature if there's interest in doing it.

Thanks.

From what you are describing, --fast-list should do what you need.
By default rclone does not use this (as it's not supported by all cloud services) and queries each folder seperately.

--fast-list instead balls together all (or at least a lot) of list requests. Not only is this way faster on a drive with many files, but it is also much less API intensive.

Gdrive supports this, and in almost all cases it is better to use it - but especially for large and complex syncs.

Do be aware that due to how it works, all listing will happen before the command even starts working - so expect that it might take a minute (or two depending on the size of the directory structure) before you see the command start executing. Just don't freak out when it appears to have stalled - it's normal :slight_smile:

Cool. --fast-list works like 10x faster. Thanks!

:smiley:
This is why it's always worth reading the docs. There are some "hidden gems" of setting in there. Many of which can greatly benefit you to be tweaked in many use-cases, and may of them are not enabled by default for various reasons.

Here are 2 more you absolutely should be aware of on Gdrive as it relates to performance:

--drive-chunk-size CHUNKSIZE
Recommend:
--drive-chunk-size 64M
(can be set directly in remote config too using alternate format, see docs)

This determines how large chunks is used for uploads (only). Default is 8M - presumably to keep rclnoe default memory use low. Each chunks starts a TCP connection from scratch - and if you know how TCP works you know it has to ramp up in speed to each full potential. This results in a lot of "saw toothing" if you look at your network graph, where your bandwidth (especially if you have a lot of BW) is not actually getting used nearly to pit's full potential. 8M just finishes so fast that by the time it reaches full bandwith utilization it has to start again - and again, and much more time is being "ramping up" than staying at maximum efficiency.

Be aware that this uses more RAM. Specifically (chunk-size x amount of active transfers). Do not run out of RAM of rclone will crash. Example: 64 x the default 4 transfers = 256M.

64M is what I recommend as a minimum assuming your system is not very short on RAM. 64 gives fairly close to optimal performance, at least on 100Mbit or so connections.
128M or 256M are viable options too, but each doubling gives you less benefit in return.
64-128M can easily almost double your effective upload speed on a fast connection. However be aware that it only helps you for files that are that large. Won't make any difference at all on an 8M file for example.

I often use 128M, but wouldn't "recommend" it for most users to start with since it's start to be more of a tradeoff. 256M is "better" but only marginally, and at a quite high cost. Only use if you have more RAM than you know what to use for. beyond 256M I don't know if it actually continues to scale or if it's a hard-cap. I am not able to detect any difference on my 162Mbit upload at least.

--drive-server-side-across-configs TRUE/FALSE
Recommend:
--drive-server-side-across-configs true
(can be set directly in remote config too using alternate format, see docs)

When set to "true" this enables Gdrives (and possibly other google-based cloud systems?) to do server-side moves and copies between different drives so you don't need to download+reupload. This is obviously hugely beneficial if you operate more than one drive and do syncing or similar things. The internal Google bandwidth is hundred of MB or even GB a second and speed is only really limited by pure number of files (and daily upload quota).
There are no special commands to do this. You just refer to the usual copy, move or sync commands and rclone will try to do it if it is possible when this option is enabled.

It is disabled by default because it can not be guaranteed to work between all setups and thus could be confusing, but the more similar your drives are the more likely it is to work just fine.

Bonus pro-tip:
If you use a mount AND you normally only upload to Gdrive from one system at a time (not a multi-user-environment) then there are some settings you can use to basically pre-cache the attribute and directory information for the whole drive. This means your mount can feel as snappy as a local drive when browsing it, or searching it - eliminating the "laggy feeling". Also greatly speeds up file-access as most of the time listings are not required and rclone can just ask for the file right away.

It should not be used for a multi-user environment and I would not recommend anyone use this unless they understand the (fairly minor) risks associated with it, so I will leave this as a separate topic. Please tell me if you want to know mre and I will elaborate on the details :slight_smile:

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.