Concurrent read accesses on the same file through Rclone VFS mount

leoluan · June 30, 2020, 4:41pm

Impressive! Thanks for the great efforts!

Actually being able to read ahead in a controlled way will be an advantage in a scenario where user thread reads sequentially. It would be great if the data is always sitting in AsyncReader's buffer ready to be consumed. Do you suggest that we should not use downloaders in this case either?

Since in our use case the streams are activated by processes running inside a VM, potentially we need a large number of ChunkedReaders. But I guess we can start from a slice.

The main question is whether we should avoid using downloaders if/when we do not use full cache mode.

ncw:

leoluan:

Alternatively or additionally, maybe we can modify the full cache mode to allow the read data to be passed back this way, in addition to writing the read data to the cache file? This eliminates the need to read the data from the cache file and it also allows the full cache mode to continue to operate when the cache space is full and we detect ENOSPC error. The cache writes in downloader.Write() can resume after purge or cleaner is able to reclaim cache space?

That would be a nice improvement.

What we could do is change the Downloaders.Download call to take a []byte buffer too as well as the range. Then in Downloader.Write we could look through any waiters that might be present and write into the pending buffer. I guess we might need to keep a Ranges for the buffer so we know exactly what we have and haven't written. Then when the waiter returned we might have to read some additional stuff off the disk that was on there already.

I don't think that would be too complicated.

Carrying on after ENOSPC could be the next addition after that.

That would be great!

ncw · June 30, 2020, 9:26pm

No, this will still use the async buffer.

Hmm, maybe first thing to do might be to have a play with the downloaders and see if they can be pressed into service with the buffer filling you suggested. I'll have a go with that. Maybe that will converge the designs which would be nice so we can use the downloaders in both places...

leoluan · July 2, 2020, 1:57am

Upon a closer look at the trace of our use case, I found that the 2X performance gain we got from the new full cache mode was from the cache hits. I did experiments with the current master and the current master with the mutex fix reverted. The time taken for the VM to boot from the S3 storage was similar between the two. Both are equally good.

It's not clear to me why allowing multiple VFS requests to be served concurrently did not further boost the performance. Was it due to a bottleneck elsewhere? Are the S3 requests queued on a single http connection?

ncw · July 2, 2020, 9:08pm

Possibly...

The default setting of the mount is to use async requests - if you've set it to sync that would do it... I doubt you've done that though.

I don't think so no. Rclone uses the same technique to do multithread downloads which increase download speed quite a lot.

leoluan · July 2, 2020, 11:25pm

Do you mean --async-read? I am using the default.

Rclone seems to use a single connection for all requests? What has been the maximum aggregated bandwidth among multi-thread downloads observed? How about the number of fetches per second? Just hope to understand whether there is room for further performance optimization.

The main open issue for our use case (if we use full cache mode) is the need to be able to purge the cache. We do not have the luxury to have a cache space larger than the file that we are reading. The whole VM is read sequentially in its entirety while it is booting and generating random reads at the same time. We do not have writes through this fuse mount. So it should be relatively easy to clear the cache when we run out of space?

ncw · July 3, 2020, 10:47am

It shouldn't do...

Here is a multi-thread copy

$ rclone copyto -vv s3:rclone/50M --bwlimit 1M /tmp/50M --multi-thread-cutoff 10M
2020/07/03 11:40:07 DEBUG : rclone: Version "v1.52.2-195-ga7c1b32b-fix-drive-list-teamdrives-beta" starting with parameters ["rclone" "copyto" "-vv" "s3:rclone/50M" "--bwlimit" "1M" "/tmp/50M" "--multi-thread-cutoff" "10M"]
2020/07/03 11:40:07 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2020/07/03 11:40:07 INFO  : Starting bandwidth limiter at 1MBytes/s
2020/07/03 11:40:07 DEBUG : Local file system at /tmp/50M: Waiting for checks to finish
2020/07/03 11:40:07 DEBUG : Local file system at /tmp/50M: Waiting for transfers to finish
2020/07/03 11:40:07 DEBUG : 50M: Starting multi-thread copy with 4 parts of size 12.500M
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 4/4 (39321600-52428800) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 3/4 (26214400-39321600) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 2/4 (13107200-26214400) size 12.500M starting
2020/07/03 11:40:07 DEBUG : 50M: multi-thread copy: stream 1/4 (0-13107200) size 12.500M starting

And here is the result of netstat while it is running

$ netstat -tuanp | grep rclone
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp   984297      0 10.2.0.9:35074          52.95.148.82:443        ESTABLISHED 519130/rclone       
tcp   814061      0 10.2.0.9:35078          52.95.148.82:443        ESTABLISHED 519130/rclone       
tcp   924320      0 10.2.0.9:35070          52.95.148.82:443        ESTABLISHED 519130/rclone       
tcp   906621      0 10.2.0.9:35072          52.95.148.82:443        ESTABLISHED 519130/rclone

So you can see 4 connections open.

Can you try netstat to see if you see the same? And with the mount?

I know that people quite regularly fill 10 Gbit pipes with rclone downloads. I haven't seen this myself though as I don't have the hardware!

Almost certainly! I haven't paid much attention to optimization. The Go tools are very good here if you want to help - there is a list of some of the things you can measure here - https://rclone.org/rc/#other-profiles-to-look-at

The tricky part is clearing the backing store for Open files. Getting the writeback to the buffer working first would help here.

leoluan · July 6, 2020, 7:16pm

[Edit: I noticed that I probably misunderstood what you meant by "the writeback to the buffer" in the sentence above. When I wrote my earlier response initially, I thought you were referring to the approach we talked about earlier where the downloader passes the newly requested data through a buffer passed in by the caller. But maybe you actually meant the writeback of the dirty data in cache to the backing store (the S3 object storage in our case)? I am removing the last paragraph of my response that might cause further confusion before a clarification of what we are discussing here. ]

Thanks for the info on the performance tools! I tried them and found that Rclone does fine on scaling up the number of connections on-demand! The profile tool does not find any contended mutex now, which is also great.

Other than making sure that the full-cache mode is stable, the main open issue for our use case is that our files will exceed the cache size. In our use case, any data already accessed through a VFS read request is unlikely to be accessed again. So we don't need to keep those data in the cache. Although this use case is special, I believe this feature would be optimal in general for retrieving data from object storage to hydrate a new copy in faster storage.

leoluan · July 6, 2020, 8:05pm

Can we have --vfs-cache-clear-read-data-first option and periodically clear the data that has been read through VFS from the cache? We can keep a range list for all VFS reads and compare that list with the cache ranges to decide which ranges can be cleared first. We will need to use fallocate to punch holes in the cache file. Can we experiment and possibly support this option for Linux first?

I assume that, in the use cases where the files are not overwritten, we can safely clear the cache data that has been read without interfering with writes (when there is none and iterm.IsDirty() is false).

system · September 5, 2020, 4:05pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.