Actually being able to read ahead in a controlled way will be an advantage in a scenario where user thread reads sequentially. It would be great if the data is always sitting in AsyncReader's buffer ready to be consumed. Do you suggest that we should not use downloaders in this case either?
Since in our use case the streams are activated by processes running inside a VM, potentially we need a large number of ChunkedReaders. But I guess we can start from a slice.
The main question is whether we should avoid using downloaders if/when we do not use full cache mode.
Hmm, maybe first thing to do might be to have a play with the downloaders and see if they can be pressed into service with the buffer filling you suggested. I'll have a go with that. Maybe that will converge the designs which would be nice so we can use the downloaders in both places...
Upon a closer look at the trace of our use case, I found that the 2X performance gain we got from the new full cache mode was from the cache hits. I did experiments with the current master and the current master with the mutex fix reverted. The time taken for the VM to boot from the S3 storage was similar between the two. Both are equally good.
It's not clear to me why allowing multiple VFS requests to be served concurrently did not further boost the performance. Was it due to a bottleneck elsewhere? Are the S3 requests queued on a single http connection?
Rclone seems to use a single connection for all requests? What has been the maximum aggregated bandwidth among multi-thread downloads observed? How about the number of fetches per second? Just hope to understand whether there is room for further performance optimization.
The main open issue for our use case (if we use full cache mode) is the need to be able to purge the cache. We do not have the luxury to have a cache space larger than the file that we are reading. The whole VM is read sequentially in its entirety while it is booting and generating random reads at the same time. We do not have writes through this fuse mount. So it should be relatively easy to clear the cache when we run out of space?
$ rclone copyto -vv s3:rclone/50M --bwlimit 1M /tmp/50M --multi-thread-cutoff 10M
2020/07/03 11:40:07 DEBUG : rclone: Version "v1.52.2-195-ga7c1b32b-fix-drive-list-teamdrives-beta" starting with parameters ["rclone" "copyto" "-vv" "s3:rclone/50M" "--bwlimit" "1M" "/tmp/50M" "--multi-thread-cutoff" "10M"]
2020/07/03 11:40:07 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2020/07/03 11:40:07 INFO : Starting bandwidth limiter at 1MBytes/s
2020/07/03 11:40:07 DEBUG : Local file system at /tmp/50M: Waiting for checks to finish
2020/07/03 11:40:07 DEBUG : Local file system at /tmp/50M: Waiting for transfers to finish
2020/07/03 11:40:07 DEBUG : 50M: Starting multi-thread copy with 4 parts of size 12.500M
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 4/4 (39321600-52428800) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 3/4 (26214400-39321600) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 2/4 (13107200-26214400) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 1/4 (0-13107200) size 12.500M starting
And here is the result of netstat while it is running
$ netstat -tuanp | grep rclone
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 984297 0 10.2.0.9:35074 52.95.148.82:443 ESTABLISHED 519130/rclone
tcp 814061 0 10.2.0.9:35078 52.95.148.82:443 ESTABLISHED 519130/rclone
tcp 924320 0 10.2.0.9:35070 52.95.148.82:443 ESTABLISHED 519130/rclone
tcp 906621 0 10.2.0.9:35072 52.95.148.82:443 ESTABLISHED 519130/rclone
So you can see 4 connections open.
Can you try netstat to see if you see the same? And with the mount?
I know that people quite regularly fill 10 Gbit pipes with rclone downloads. I haven't seen this myself though as I don't have the hardware!
Almost certainly! I haven't paid much attention to optimization. The Go tools are very good here if you want to help - there is a list of some of the things you can measure here - https://rclone.org/rc/#other-profiles-to-look-at
The tricky part is clearing the backing store for Open files. Getting the writeback to the buffer working first would help here.
[Edit: I noticed that I probably misunderstood what you meant by "the writeback to the buffer" in the sentence above. When I wrote my earlier response initially, I thought you were referring to the approach we talked about earlier where the downloader passes the newly requested data through a buffer passed in by the caller. But maybe you actually meant the writeback of the dirty data in cache to the backing store (the S3 object storage in our case)? I am removing the last paragraph of my response that might cause further confusion before a clarification of what we are discussing here. ]
Thanks for the info on the performance tools! I tried them and found that Rclone does fine on scaling up the number of connections on-demand! The profile tool does not find any contended mutex now, which is also great.
Other than making sure that the full-cache mode is stable, the main open issue for our use case is that our files will exceed the cache size. In our use case, any data already accessed through a VFS read request is unlikely to be accessed again. So we don't need to keep those data in the cache. Although this use case is special, I believe this feature would be optimal in general for retrieving data from object storage to hydrate a new copy in faster storage.
Can we have --vfs-cache-clear-read-data-first option and periodically clear the data that has been read through VFS from the cache? We can keep a range list for all VFS reads and compare that list with the cache ranges to decide which ranges can be cleared first. We will need to use fallocate to punch holes in the cache file. Can we experiment and possibly support this option for Linux first?
I assume that, in the use cases where the files are not overwritten, we can safely clear the cache data that has been read without interfering with writes (when there is none and iterm.IsDirty() is false).