I'm mounting my Google Drive Team using rclone. Drive that is storing very large files (multiple GB's).
Problem is that I don't need to have an access to whole files - I'm just streaming small chunks of it at a time, and I'm hitting some kind of IOPS limit. I know that Google is limiting GDrive usage to 2-3 transfers per second, and that's probably causing the problem here.
However - have anyone here bypassed this limitation? Since it's a team drive, I could access it from multiple account at the time and multiply that 2-3 transactions limit by number of available accounts, right?
I've though about writing some kind of own FUSE file system that would cover multiple mounts of rclone and switching between them during reads (like RAID 1 over multiple Google drive accounts ). However I'm guessing if I'm not overengineering something here, and maybe there's a simpler way to achieve this?
Cheers
What is your rclone version (output from rclone version)
1.53.3
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Ubuntu 20.04
Which cloud storage system are you using? (eg Google Drive)
I'm using custom written software that's using big files similarly to the database. It's basically performing large number of small reads (10-50 KB/s each).
To simply things let's assume that I'm reading small chunks of a single, big file.
I've tried with adjusting --vfs-read-chunk-size to kilobytes, but it haven't changed much. I'm performing these on AWS EC2, so network connection shouldn't be an issue there.
Actually it sounds very similar to my problem - I could never solve it. I basicly tried all possible flags... The problem got even worse the longer I tried - currently speed is down to 3 MB/s... So if you find a solution please let me know too!
using vfs-cache-mode-full won't help me much, since file that I'm reading is rather big, and read chunks are rather random (so caching won't help much in that case).
In order to visualize my problem, i've written simple script that's randomly reading 16K chunks of data from large file. For sake of problem, let's assume that file has 500GB.
Script (python):
From my tests it looks like I'm getting ~2 reads per second, and would align with the 2-3 transfers per second that were mentioned above.
Intrestingly, when I'm running above script on multiple machines (with different accounts on each) at the same time I still get ~2 reads/per second o each
So it means that GDrive is serving this file more frequently if multiple accounts are reading it.
I've modified this script slightly to read this file from multiple accounts at the same time but on a single machine, but sadly this still hits the same 2 read per second wall
Hmmm, sorry for posting 3rd thing from me, but I've realized that since these reads are sequential, these ~2 reads per second might just be coming from the HTTP overhead and network latency, right? Maybe indeed it isn't any limit set by Google that's limiting API calls throughput?