Optimising mount for lots of small random reads across multiple files at once

Daniel_Kennedy · April 9, 2021, 1:50am

@ncw I have been reading this with interest.

My use case requires lots of very small (1 or 2 MB) random reads (between 7 and 64) of larger files using a mount and Google Drive. The reads are never the same and so vfs-cache-mode full doesn’t seem to help. I’m trying to optimise to achieve the fastest seek/reads as possible.

I’ve spent quite some time reading and playing with different flags etc, so I don’t think I’m looking for specific help with that. I’m just wondering whether there is anything in the pipeline that might help me, particularly around concurrent reads without using a cache / ways to speed up random seeks/reads?

asdffdsa · April 9, 2021, 1:55pm

what is the total size of the large files?

perhaps consider testing at wasabi, a s3 rclone, which is known for hot storage.
wasabi does not have all those gdrive api and other limits.
might get better performance.

ncw · April 9, 2021, 4:58pm

This scheme outlined in the thread you linked to still seems to be to be quite a good one!

What we could do is something like this for --vfs-cache-mode off:

we get a read for offset X - open the file and return data
we get an read for offset Y which needs seeking
- currently we close the reader and reopen at Y
- instead we open a new reader for this
we can now read at both places. If a reader is not read for 5 seconds we can close it.

There isn't a development for this in the works at the moment, but it is a nice idea...

I would have thought -vfs-cache-mode full would be better performance for you.

Note that opening files on Google Drive is quite slow at the best of times and every time you seek, you have to open the file afresh, so you make be at the limit of what is possible, I don't know.

Daniel_Kennedy · April 10, 2021, 1:04pm

Thanks for this. I tried Wasabi (only uploaded 1TB of data as that's the maximum allowed on the free trial). I am getting much better performance. Reads are quicker and much more predictable in terms of latency. The only issue is the price (which is amazing compared to most other storage solutions out there, but still much more expensive than Drive).

I think i've got some choices to make to balance price vs performance.

Cheers for the recommendation!

Daniel_Kennedy · April 10, 2021, 1:09pm

Thanks Nick.

I think you may be right. I tested GD against Wasabi (as recommended by @asdffdsa above) and got much more predictable reads (sometimes GD was quicker, but on average Wasabi was much better and I didn't get some of the much much longer reads that I do with GD).

I'll keep testing anyway, and will keep an eye out for any rclone development that may help optimise further.

Cheers!

asdffdsa · April 10, 2021, 1:12pm

good to know.

as for price, depends on use-case.
i have local backup servers, i rarely access the cloud data.
i keep latest veeam and other backups in wasabi.
and older backups in aws s3 deep glacier for $1.01/TB/month.
so the overall pricing is very cheap

it is great that you can use one tool with gdrive and s3 wasabi?

ncw · April 10, 2021, 9:10pm

Interesting. You don't pay for IOs on drive so they throttle them so you don't use too many.

However you do pay for IOs on aws S3 so they make them go as fast as possible.

I always find aws S3 fast in my testing if you want to try the reassuringly expensive cloud storage...

asdffdsa · April 10, 2021, 10:00pm

but not with wasabi

system · June 10, 2021, 6:01pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.