How to force sequential downloads in a full VFS mount?

What is the problem you are having with rclone?

Consider the following minimal mount command:
rclone mount remote1: /mount --vfs-cache-mode full --cache-dir /cache --vfs-read-chunk-size 128M --vfs-read-ahead 1T --buffer-size 0M --allow-other.

Suppose I open a 500M file /mount/A.mkv. I would like rclone to sequentially download the entire file starting from the beginning to my cache directory.

Using a modified version of the command above, rclone should download A.mkv from 0-128M, 128-256M, 256-384M, and 384-500M strictly in that order regardless of the time I seek to in A.mkv.

Ideally, rclone would make data available as it was downloading live. For example, if rclone has finished downloading the first 256M of A.mkv, I should be able to seek anywhere inside of the first 256M of A.mkv, but seeking operations to regions beyond 256M should be forced to wait until they are available.

What flags should I append to or change in the above mount command to force such sequential downloads?

Note: In rclone 1.52.x, using the above mount command would prevent a file from being read until it was completely downloaded, which is almost perfect.

What is your rclone version (output from rclone version)

rclone v1.53.2

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux amd64

Which cloud storage system are you using? (eg Google Drive)

local (for testing)

The rclone config contents with secrets removed.

type = local
nounc = 

type = alias
remote = 1:/test

Setting a sufficiently large --vfs-read-ahead (larger than the size of the biggest file in your remote) should give you that behaviour. Is that not your observation?

When I tested it, I found that if I immediately seek to almost the end of A.mkv, the corresponding A.mkv in my cache folder contains the ending and whatever could be downloaded of the beginning before I seeked. Data from the middle of the file is not present.

I believe this means rclone prioritizes the most recently requested piece of data and downloads sequentially from that point on.

Rclone isn't really doing that as the thing requesting the file is doing it.

If you request a file and it's being read sequentially by your player/device/application, it'll buffer ahead sequentially and get as much as your vfs-read-ahead.

If you seek in a file, you were reading sequentially, you are now making a new request for a new position in the file and you'll see all this in the debug log.

If you test and share a debug log, we can point it out and show in the logs.

The only thing that really impacts rclone is the read-ahead as long as the file/process/player is reading sequentially, it goes ahead that far to help with any network hiccups that might happen as you'll have it on local cache/disk instead assuming you are using cache-mode full.

Here is a debug log from a mount where I opened A.mkv and immediately skipped to about 3 seconds from the end: The log is approximately 16 MB when unzipped.

A small, unexceptional excerpt:

2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 262144 length 32768 chunkOffset 98304 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 262144 length 32768 chunkOffset 0 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 294912 length 32768 chunkOffset 0 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : &{A.mkv (rw)}: Read: len=49152, offset=4687044608
2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 491520 length 32768 chunkOffset 65536 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 294912 length 32768 chunkOffset 98304 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : A.mkv: ChunkedReader.Read at 524288 length 32768 chunkOffset 65536 chunkSize 134217728
2020/11/07 10:46:11 DEBUG : A.mkv(0xc00061c480): _readAt: size=49152, off=4687044608

My goal is for rclone to ignore position requests and to instead always fully download the file sequentially starting from the beginning of the file. The order and position in which data is requested by the application should be discarded by rclone until the file has been cached. This was the behavior of rclone under the aforementioned mount command prior to the 1.53.x iterations.

That's not how any file system works though so that's a bit baffled as that was a legacy item from the previous versions because there was no chunked reading in the old vfs-cache-mode full and it downloaded the whole thing before anything happened.

You can always use an old version if that is a requirement but for the life of me, I can't understand why that would be wanted before as you have to get a whole file before anything happens.

If you have a file 'cached' on disk, it operates almost identical to a local file with the minor rclone fuse layer latency on top. As @darthShadow mentioned, if you set vfs-read-ahead, it would continue to grab the file until it was full and seek locally on disk which for all intent, does the same thing.

If you can run a mount and share a full debug log, it should be there. The few second clip just shows normal reading and no seeks or anything.

At the start of my previous message, I provided a download link to my full debug log which is approximately 16 MB when unzipped.

I don't want to exactly replicate the old rclone's vfs-cache-mode full. In a perfect world, rclone would discard the application's position requests but allow it to read data as it is sequentially downloaded live.

This behavior is preferable for VMs and databases since random reads are significantly quicker from the cache than from any remote. Additionally, random reads incur a significant performance penalty on my remote.

Thanks. I missed the link the first time.

It looks like it's doing what you'd expect as I only see one file open and it seems to be reading ahead.

 egrep 'Open|Flush' rclone.log
2020/11/07 10:45:46 DEBUG : A.mkv: Open: flags=OpenReadOnly+OpenNonblock
2020/11/07 10:45:46 DEBUG : A.mkv: Open: flags=O_RDONLY|0x800
2020/11/07 10:45:46 DEBUG : A.mkv: >Open: fd=A.mkv (rw), err=<nil>
2020/11/07 10:45:46 DEBUG : A.mkv: >Open: fh=&{A.mkv (rw)}, err=<nil>
2020/11/07 10:46:12 DEBUG : &{A.mkv (rw)}: Flush:
2020/11/07 10:46:12 DEBUG : A.mkv(0xc00061c480): RWFileHandle.Flush
2020/11/07 10:46:12 DEBUG : &{A.mkv (rw)}: >Flush: err=<nil>
2020/11/07 10:46:19 DEBUG : &{A.mkv (rw)}: Flush:
2020/11/07 10:46:19 DEBUG : A.mkv(0xc00061c480): RWFileHandle.Flush
2020/11/07 10:46:19 DEBUG : &{A.mkv (rw)}: >Flush: err=<nil>

That's a pretty different use case from seeking in a MKV file as not sure I'd run a VM or Database on a cloud mounted remote.

When you seek in a file, rclone chooses to open a new downloader if there isn't an existing one within a certain distance. If there is a downloader that is downloading stuff within --buffer-size of where the seek is requested then rclone will wait for that to complete, otherwise it will open a new downloader.

What setting --vfs-read-ahead is tell rclone that if you read from a point in a file then you want to read at least that much data.

So setting --vfs-read-ahead to larger than each file should cause each file to be downloaded entirely (provided it is kept open long enough).

However seeking will still cause a new downloader.

So I don't think it is possible to exactly emulate the download it all sequentially behaviour that you want.

I think this would need a new VFS flag.

But what about the startup time? Say you wanted to open a 1G file and read the last 100 bytes you'd have to wait for the whole 1G to be downloaded?

You could warm up the cache manually by downloading all the files you want sequentially (you can use rclone md5sum for this).

For VMs and databases, this behavior is desirable since the whole file will be read from anyways. I access media files via another mount with animosity22's recommended settings.

I don't know how difficult this would be to implement, but it should be a very low priority for the rclone project since most people would not benefit from it. I am content with using the 1.52.3 binary for my VM images and the latest version of rclone for everything else if it is too time consuming to implement.

If you want you can please make a new issue on github about it so we don't forget about it.

I think it is worthwhile doing at some point and probably isn't too hard.

However the VFS layer is fearsomely complicated and things always seem harder than I expect!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.