Optimize Gdrive mount for fast lookups

eldavo · May 9, 2021, 1:20pm

What is the problem you are having with rclone?

Hi,
I have many large databases (>100GB!) on Gdrive, mounted read-only with rclone. I need to randomly lookup some data (for example, a 256KB chunk on a random offset in a file) and I need to do it as fast as possible (max 2 seconds allowed). How can I accomplish this with rclone? What options are advised? Would the cache backend be useful at all? I'm already using my own client id.

What is your rclone version (output from `rclone version`)

rclone v1.45

os/arch: linux/arm
go version: go1.11.6
I can update, or compile beta from scratch, no problem....

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Raspberry pi 4 8gb x64, so i can use 4 threads and max 6gb of ram.
I might install gentoo for max performance, idk

Which cloud storage system are you using?

Google Drive

Thank you!

asdffdsa · May 9, 2021, 2:39pm

hello and welcome to the forum,

that is a very old version of rclone, really need to update to latest stable v.1.55.1.

really had to give good advice, as you did not answers the questions from the help template?

eldavo · May 9, 2021, 4:39pm

I don't have logs and config because my question is about a system that still does not exist and is going to be built in the next few days. You can think about a typical config for Google drive and a typical rclone mount cmdline.

asdffdsa · May 9, 2021, 4:53pm

i thought that meant you have a rclone command and config file?

that buggy, never left beta cache backend has been depreciated

eldavo · May 9, 2021, 5:09pm

i can make it up, the config looks like:

    [RO]
    type = drive
    client_id = lol
    client_secret = blabla
    scope = drive.readonly
    token = {"access_token":"blabla","token_type":"Bearer","refresh_token":"blabla","expiry":"2021-04-22T11:21:33.3364267+02:00"}
    root_folder_id = 1VK-blabla

And I guess the cmdline is like:

rclone mount RO:/ /home/pi/

What about vfs cache?

asdffdsa · May 9, 2021, 5:22pm

pi4 - if using sd card and/or external usb, random access might be very slow. i find my pi4 very slow at many basic tasks.
vfs cache - need to update rclone to latest stable
gdrive has lots of throttling

every use case is different, test using a simple command and establish a baseline.
rclone mount RO: /home/pi -vv

then try rclone mount RO: /home/pi -vv --vfs-cache-mode=full

eldavo · May 13, 2021, 12:18pm

At the moment the fastest config and cmdline I could experiment about was the following:

[ro]
type = drive
scope = drive.readonly
v2_download_min_size = 0
pacer_min_sleep = 1ms
disable_http2 = false

rclone mount ro: db/ --vfs-read-chunk-size=64k --poll-interval=1h  --dir-cache-time=2h --buffer-size=0 --cache-dir /tmp/rclone --vfs-cache-mode full --no-checksum --no-modtime --read-only --vfs-read-wait 0 --max-read-ahead 0 --use-mmap --fast-list --cache-dir /tmp/rclone/ --checkers 2  --no-check-certificate  --multi-thread-cutoff 0 --multi-thread-streams 2 --vfs-cache-max-age 10000h -q --use-cookies

The logs are similar to this snippet (log taken when chunk size was set to 32k)

ChunkedReader.Read at 31591882752 length 16384 chunkOffset 31591870464 chunkSize 32768
Read: read=16384, err=
Read: len=12288, offset=21037834240
ReadFileHandle.seek from 31591899136 to 21037834240 (fs.RangeSeeker)
ChunkedReader.RangeSeek from 31591899136 to 21037834240 length -1
ChunkedReader.Read at -1 length 12288 chunkOffset 21037834240 chunkSize 32768
ChunkedReader.openRange at 21037834240 length 32768
Read: read=12288, err=
Read: len=16384, offset=21037846528
ChunkedReader.Read at 21037846528 length 16384 chunkOffset 21037834240 chunkSize 32768
Read: read=16384, err=
Read: len=12288, offset=7004708864
ReadFileHandle.seek from 21037862912 to 7004708864 (fs.RangeSeeker)
ChunkedReader.RangeSeek from 21037862912 to 7004708864 length -1
ChunkedReader.Read at -1 length 12288 chunkOffset 7004708864 chunkSize 32768
ChunkedReader.openRange at 7004708864 length 32768
Read: read=12288, err=
Read: len=16384, offset=7004721152
ChunkedReader.Read at 7004721152 length 16384 chunkOffset 7004708864 chunkSize 32768
Read: read=16384, err=

rclone version is the latest compiled from scratch. (rclone v1.56.0-DEV)
Any advice to further improve ? /tmp is in RAM.

asdffdsa · May 13, 2021, 12:37pm

--fast-list does nothing on a mount.

as for RAM drive, not sure what would help, as the main delay is the slow network.

as a test,
i would get a free trial account at wasabi, a s3 clone, known for hot storage.

eldavo · May 14, 2021, 9:31am

Hi,
thank you for suggesting.
I made some profiling in the disk reads in the DB application and I can exclude that the app is slow and I can say that each read takes from 430 to 600ms, which I think is impressive, very fast, but unfortunately not fast enough for my use case (it needs to be at least 30% faster).
The Developer console say that their apis answer in 100ms. I don't know if they say the truth or not...
As for wasabi, I'm too lazy to reupload the files there I will try to convince myself...

asdffdsa · May 14, 2021, 12:41pm

did you test and find that no buffer was faster then using the buffer?

Bilkoff · May 15, 2021, 7:15pm

Drive doesn't work for chia, every time the answers are slower randomly, and you can reach the download limit easily because when you download a small part of one file, google counts the whole size downloaded. With 100gb files you get limited fast

Animosity022 · May 15, 2021, 7:22pm

That's just not right.

eldavo · May 31, 2021, 8:03pm

wrong.
With the one-week old chiapos update, which introduces parallel reads for complete proofs, these are the complete proof lookup times.
14.73
17.31
13.13
14.34
24.92
13.77
14.99
14.61
As you can see, they are below 30 seconds, so a complete proof can be looked up in the timeframe, and a reward be won.
My bet is that a custom implementation of GDrive APIs can further improve times. (difficult period for me so i can't work on this stuff at the moment)
A good question is: does a k=33 plot have 2x the lookups or does it have the same number of lookups with 2x size? If it's the second, you can further improve your chances by making larger plots.
Another question is how will it go with pools. We will see..

Gleb_Geinke · July 16, 2021, 7:59pm

Do you get that with the mount commands you have posted above? I have tried, but I get timeout for the challenges

eldavo · July 26, 2021, 6:00pm

Yep.
Pool farming is working, one day I even got 100% succesful points.
I'm not getting api limits, idk why you are.
Good luck

Gleb_Geinke · July 29, 2021, 12:18am

I see I will try creating a pool plot. It wasn't an api limit, it was the official chia cli plot checker that wasn't working, but I will test it on an actual pool plot then. Many thanks

TrevorKSmith · August 4, 2021, 8:55pm

I was searching for an answer as well for if the amount of lookups change with bigger plots.

TrevorKSmith · August 5, 2021, 2:09am

Did you get any progress with this? im testing as well but getting ties of 20-50 secs. Im on dropbox.

eldavo · August 6, 2021, 10:51am

Just test it. With debug options of rclone you can see how many reads are done and how big they are.

bucksur · August 23, 2021, 3:35pm

You can think about a typical config for Google drive and a typical rclone mount cmdline.

Optimize Gdrive mount for fast lookups

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using?

What is your rclone version (output from `rclone version`)