Range seeks do not trigger chunk caching

What is the problem you are having with rclone?

I have an application that uses range seeks repeatedly to read a 2G+ file from a rclone mount of a sftp remote.
I cannot change the application.
By inspecting the cache directory I see it is not downloading the file by the specified vfs-read-chunk-size or vs-read-ahead (both set to 128M).
The ideal behavior for me would be to cache the chunks that correspond to the range seek ranges.
A compromise would be to cache the entire file on access. What would the rclone mount parameters be then?

What is your rclone version (output from rclone version)

rclone v1.57.0

  • os/version: darwin 12.0.1 (64 bit)
  • os/kernel: 21.1.0 (arm64)
  • os/type: darwin
  • os/arch: arm64
  • go/version: go1.17.2
  • go/linking: dynamic
  • go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

sftp

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone cmount \
        -vv \
	--cache-dir ~/.cache/rclone/monolayer_data \
	--vfs-cache-mode full \
	--vfs-cache-max-age 1000h0m0s \
	--vfs-cache-max-size 200G \
	--vfs-cache-poll-interval 1h0m0s \
	--vfs-write-back 5s \
	--buffer-size 64M \
	--vfs-read-ahead 128M \
	--vfs-read-chunk-size 128M \
	--vfs-read-wait 1s \
	--transfers 16 \
	--multi-thread-streams 16 \
	:sftp,host="REDACTED",user="REDACTED",port="22",pass="",key_file="":source \
	target

The rclone config contents with secrets removed.

See connection string above.

A log from the command with the -vv flag

I'm not sure I see anything of issue as it's reading from cache.

So it doesn't quite work like that as it depends on your access pattern and what you read from the file.

If you open a file, only read a few bytes and close it, it won't continue to read ahead, because you closed the file.

2021/11/22 13:09:03 DEBUG : vfs cache: looking for range={Pos:287522816 Size:16384} in [{Pos:0 Size:126976} {Pos:285556736 Size:2093056}] - present true

The present true is telling you it read from cache.

None. You have to trigger that behavior. You can cat the file as an example to 'prime' your cache if that's your desire.

So you are caching data and it is working properly based on your logs.

Thanks for the log analysis. If I understand you correctly:

  • This access pattern (open/close repeatdly) is not covered by rclone then.
  • cat-ing the files works, and apparently there's not built-in solution in rclone to do it on demand.

It is I'd say, but it doesn't do what you want it to do. If a file is closed, reading ahead and keeping buffers would be inefficient because the application asked it to be closed. Open/seek/close/open/seek/close is a very inefficient cloud storage use case which mitigated a bit with cache mode full.

Correct. You'd have to use case your flow to cat / work on file / etc.

This might be of interest:

As it is not really an easy 'fix' per se.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.