Google Drive and 2 transfers per second on big files bypass

rclone_samurai · December 14, 2020, 9:22pm

What is the problem you are having with rclone?

I'm mounting my Google Drive Team using rclone. Drive that is storing very large files (multiple GB's).

Problem is that I don't need to have an access to whole files - I'm just streaming small chunks of it at a time, and I'm hitting some kind of IOPS limit. I know that Google is limiting GDrive usage to 2-3 transfers per second, and that's probably causing the problem here.

However - have anyone here bypassed this limitation? Since it's a team drive, I could access it from multiple account at the time and multiply that 2-3 transactions limit by number of available accounts, right?

I've though about writing some kind of own FUSE file system that would cover multiple mounts of rclone and switching between them during reads (like RAID 1 over multiple Google drive accounts ). However I'm guessing if I'm not overengineering something here, and maybe there's a simpler way to achieve this?

Cheers

What is your rclone version (output from `rclone version`)

1.53.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 20.04

Which cloud storage system are you using? (eg Google Drive)

Google Drive

asdffdsa · December 14, 2020, 9:28pm

hello and welcome to the forum,

many rcloners stream video files from gdrive without problems, including myself.

it can be confusing.
as the docs are about transferring multiple files,
not streaming from one file

can you post the rest of the requested info?

the config file, redacting id/pwd.
the rclone command.

what application is doing the streaming?
are you trying to stream from multiple files at the same time or just one file at a time?

rclone_samurai · December 14, 2020, 10:43pm

Config:

[gdrive]
type = drive
client_id = ...
client_secret = ...
scope = drive
token = ...
team_drive =
root_folder_id =

rclone command:

rclone mount gdrive:/ /mnt/gdrive/

I'm using custom written software that's using big files similarly to the database. It's basically performing large number of small reads (10-50 KB/s each).

To simply things let's assume that I'm reading small chunks of a single, big file.

I've tried with adjusting --vfs-read-chunk-size to kilobytes, but it haven't changed much. I'm performing these on AWS EC2, so network connection shouldn't be an issue there.

asdffdsa · December 14, 2020, 10:49pm

there are a lot of flags when using rclone mount and so a lot of way to optimize.

do not know about running a vm in aws and how that affects rclone performance.
but on a local computer, i would use this
https://rclone.org/commands/rclone_mount/#vfs-cache-mode-full

might try --read-only Mount read-only.

what is the file size of that single, big file?

VBB · December 15, 2020, 12:38am

All this sounds a lot like Nicolas' use case here:

Nicolas_Girard · December 15, 2020, 4:58am

Actually it sounds very similar to my problem - I could never solve it. I basicly tried all possible flags... The problem got even worse the longer I tried - currently speed is down to 3 MB/s... So if you find a solution please let me know too!

rclone_samurai · December 15, 2020, 10:27am

using vfs-cache-mode-full won't help me much, since file that I'm reading is rather big, and read chunks are rather random (so caching won't help much in that case).

In order to visualize my problem, i've written simple script that's randomly reading 16K chunks of data from large file. For sake of problem, let's assume that file has 500GB.

Script (python):

gist.github.com

https://gist.github.com/Axadiw/c85d2b51c15d721a3f291eead8a4baa9

speedtest.py

#!/usr/bin/env python

import os
import sys
import random
import time
  
random.seed(1234)

PATH = "/mnt/gdrive/large_random_file"

This file has been truncated. show original

From my tests it looks like I'm getting ~2 reads per second, and would align with the 2-3 transfers per second that were mentioned above.

Can I somehow speed this script up?

rclone_samurai · December 15, 2020, 11:01am

Intrestingly, when I'm running above script on multiple machines (with different accounts on each) at the same time I still get ~2 reads/per second o each

So it means that GDrive is serving this file more frequently if multiple accounts are reading it.

I've modified this script slightly to read this file from multiple accounts at the same time but on a single machine, but sadly this still hits the same 2 read per second wall

parallel script: https://gist.github.com/Axadiw/be554f90d955b138314b937e901388a4

rclone_samurai · December 15, 2020, 4:42pm

Hmmm, sorry for posting 3rd thing from me, but I've realized that since these reads are sequential, these ~2 reads per second might just be coming from the HTTP overhead and network latency, right? Maybe indeed it isn't any limit set by Google that's limiting API calls throughput?

system · February 14, 2021, 12:42pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Google Drive and 2 transfers per second on big files bypass

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using? (eg Google Drive)

What is your rclone version (output from `rclone version`)