Google Drive and 2 transfers per second on big files bypass

What is the problem you are having with rclone?

I'm mounting my Google Drive Team using rclone. Drive that is storing very large files (multiple GB's).

Problem is that I don't need to have an access to whole files - I'm just streaming small chunks of it at a time, and I'm hitting some kind of IOPS limit. I know that Google is limiting GDrive usage to 2-3 transfers per second, and that's probably causing the problem here.

However - have anyone here bypassed this limitation? Since it's a team drive, I could access it from multiple account at the time and multiply that 2-3 transactions limit by number of available accounts, right?

I've though about writing some kind of own FUSE file system that would cover multiple mounts of rclone and switching between them during reads (like RAID 1 over multiple Google drive accounts :wink: ). However I'm guessing if I'm not overengineering something here, and maybe there's a simpler way to achieve this?

Cheers

What is your rclone version (output from rclone version)

1.53.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 20.04

Which cloud storage system are you using? (eg Google Drive)

Google Drive

hello and welcome to the forum,

many rcloners stream video files from gdrive without problems, including myself.

it can be confusing.
as the docs are about transferring multiple files,
not streaming from one file

can you post the rest of the requested info?

  • the config file, redacting id/pwd.
  • the rclone command.

what application is doing the streaming?
are you trying to stream from multiple files at the same time or just one file at a time?

Config:

[gdrive]
type = drive
client_id = ...
client_secret = ...
scope = drive
token = ...
team_drive =
root_folder_id =

rclone command:

rclone mount gdrive:/ /mnt/gdrive/

I'm using custom written software that's using big files similarly to the database. It's basically performing large number of small reads (10-50 KB/s each).

To simply things let's assume that I'm reading small chunks of a single, big file.

I've tried with adjusting --vfs-read-chunk-size to kilobytes, but it haven't changed much. I'm performing these on AWS EC2, so network connection shouldn't be an issue there.

there are a lot of flags when using rclone mount and so a lot of way to optimize.

do not know about running a vm in aws and how that affects rclone performance.
but on a local computer, i would use this
https://rclone.org/commands/rclone_mount/#vfs-cache-mode-full

might try --read-only Mount read-only.

what is the file size of that single, big file?

All this sounds a lot like Nicolas' use case here:

1 Like

Actually it sounds very similar to my problem - I could never solve it. I basicly tried all possible flags... The problem got even worse the longer I tried - currently speed is down to 3 MB/s... So if you find a solution please let me know too!

using vfs-cache-mode-full won't help me much, since file that I'm reading is rather big, and read chunks are rather random (so caching won't help much in that case).

In order to visualize my problem, i've written simple script that's randomly reading 16K chunks of data from large file. For sake of problem, let's assume that file has 500GB.

Script (python):

From my tests it looks like I'm getting ~2 reads per second, and would align with the 2-3 transfers per second that were mentioned above.

Can I somehow speed this script up?

Intrestingly, when I'm running above script on multiple machines (with different accounts on each) at the same time I still get ~2 reads/per second o each

So it means that GDrive is serving this file more frequently if multiple accounts are reading it.

I've modified this script slightly to read this file from multiple accounts at the same time but on a single machine, but sadly this still hits the same 2 read per second wall :frowning:

parallel script: https://gist.github.com/Axadiw/be554f90d955b138314b937e901388a4

Hmmm, sorry for posting 3rd thing from me, but I've realized that since these reads are sequential, these ~2 reads per second might just be coming from the HTTP overhead and network latency, right? Maybe indeed it isn't any limit set by Google that's limiting API calls throughput?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.