I need to optimize read of large 50GB file on google drive mounted as drive, read is done in parallel and at random file location at 64x places where is just 8-16K read to each place totaling <1MB of data
I tried just mounting drive crashed issue was --timeout so I set it to 1000s and now it works but needed time is still not gr8 to get that read some 30s is needed
I tried --drive-chunk-size 256K an did did not help, looking at other setting I can not think of any that would help
I know that google drive allow parallel read of big files and looks like that work but I guess issue is minimal chink of 256K ? as I need to read only 8-16K per location and up to 64 location at once or as fast as possible
What is your rclone version (output from rclone version)
v.1.55.1
latest fuse driver
Which OS you are using and how many bits (eg Windows 7, 64 bit)
ubuntu 20.04 64bit
Which cloud storage system are you using? (eg Google Drive)
Google drive
The command you were trying to run (eg rclone copy /tmp remote:tmp)
here is my log its quite large, I don`t think there is some issue with rclone I just want to optimize if possible, while i do test which last ~1minute 16s network hit 80MB/s issue is that data which need to be read is for top ~5MB but small chunks of 8-16kb as it would be read from HDD but I know gcloud is not same and file is chunked and accessed by 256K minimum I guess
From the log, I can see your reads are opening and closing the file a lot.
grep 'Flush: err=<nil>' rclon.log | wc -l
198
and
grep OpenReadOnly rclon.log | wc -l
198
You may want to check out vfs-cache-mode full as that uses sparse files to cache what you read from the file so if you are only using chunks, it'll not use that much data locally and it'll help for your use case.
So in my case, I limit the cache size to 750GB and you can see the difference in reported size and actual use:
root@gemini:/cache# du -sh
674G .
root@gemini:/cache# du -sh --apparent-size
1.1T .
yes test is done for 5x at once as I can not do just 1 test, each test read small 8-16k chunk randomly on 64-72 times that's why it read it 198 times, in real use case it would only read it 64-72x times once in 4-5h but each time read is on different place of file I think I only header of file is read up to 8 times rest 64x is random in the ~100Gb file
vfs-cache-mode full
not sure if it would help as it will never ever read same chunks from file... maybe only file header but I still do not know what part/size of header it is (1MB or more have no idea atm)
Each file open and close is dealing with latency as that's the challenge with any cloud based storage. If you aren't ever reading the same piece of data, you'll always have that time to open/seek/read/close and repeat the process many times if that's how it is supposed to work.
There aren't any flags to deal with that as you can't really tune latency.
Yeah I suppose is like that, just if read is done in parallel do file need to be open and close?
Maybe application I use can be optimized but they say it was mode for Backblaze B2 cloud but I wanted to try it with google drive
Are you able to tune the read performance? i am having the same limitation as well but i am using onedrive instead of google drive. the one which i can do it in a very fast way would be using azure file share which is very low latency io.
That would be my last option as the azure file share is not cost efficient for my large files.