What's the suitable value to set for buffer-size with vfs-read-ahead?

What is the problem you are having with rclone?

Wonder what exactly these two parameters --buffer-size and --vfs-read-ahead are about.
According to the documentation:

When reading a file rclone will read --buffer-size plus --vfs-read-ahead bytes ahead. The --buffer-size is buffered in memory whereas the --vfs-read-ahead is buffered on disk.
When using this mode it is recommended that --buffer-size is not set too large and --vfs-read-ahead is set large if required.

I want to know details in it:
"is not set too large"?,
How much value can be treated as large? And what will happen if we set it to large? OOM or anything else?
"is set large if required"
What is the meaning of "required"? When will be the time we increase the value of is?

And for caching, wouldn't it be better to cache in memory instead of on disk? Why not just set large --buffer-size value to have the beset performer consider I got enough memory?

Assume the following use cases, what will happen as excepted?
(Consider I have the host with 64GB memory and 40TB disk space available for rclone to use, also what if the host only have limited resources for rclone? Like only 512MB memory and 20GB disk space?)

  • --buffer-size=4G
  • --vfs-read-ahead=4G
  • --buffer-size=16M over --vfs-read-ahead=16M
  • --buffer-size=16M over --vfs-read-ahead=1G
  • --buffer-size=1G over --vfs-read-ahead=16M
  • --buffer-size=1G over --vfs-read-ahead=1G

BTW, also wondering how it works? Like if I want to read a large file for the beginning or with certain position (peek), what will happen?
I think it first starts a connection to the server, and will write data into memory after reach buffer-size limit, it will write data (from 0 to the limit value) into disk, do the same with vfs-read-ahead limit, then will start to write to true disk cache file until exceed the value of vfs-cache-max-size, if the file closed, then all cache in bufee-size and vfs-cache-max-size, but keep the downloaded bits of the true cached file, is this correct or not?
Hope I can have this answered, so I can know how to balance these two parameters, thanks!

Run the command 'rclone version' and share the full output of the command.

rclone v1.63.0
- os/version: Microsoft Windows 11 Home China 22H2 (64 bit)
- os/kernel: 10.0.22621.1992 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.20.5
- go/linking: static
- go/tags: cmount

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount SCDISK: S: --cache-dir H:/Rclone \
  --no-check-certificate --file-perms=0700 \
  --buffer-size=32M \
  --vfs-read-ahead 1G\
  --use-mmap \
  --vfs-cache-mode full \
  --vfs-cache-max-age=72h \
  --vfs-cache-max-size=1024G

The rclone config contents with secrets removed.

[SCDISK]
type = webdav
host = https://192.168.1.2:1500
vendor = other
user = scruel
pass = ...
1 Like

I am a big fan of not jumping to any of these settings.

There are just too many variables between use case, connections, remotes, locations, computers, etc.

Start with the most basic mount and cache and test in your use case. Identify the problems.

Then, read all of the documentation usage, your remote (and wrappers if using), and mount. Use that to make guesses to add flags to address problems.

Keep going until you have the smallest set of flags that work.

All too often, people copy/paste flags without any idea what they do and end up with nonsensical flags for irrelevant (and often deprecated) features

You are 100% right and I use similar approach. But this question is actually quite good given that documentation is a bit vague.

I think that we will need @ncw here:) unless somebody else is also well familiar with VFS internals.

Some users have been experimenting with the size of --buffer-size and for the VFS with full the optimum value may actually be --buffer-size 0. The default is 16M so I wouldn't make it bigger than that.

You'll get increased latency and likely network usage.

The data is going to end up on disk in the VFS cache. However modern OSes will cache that in memory so the distinction between disk and RAM is a bit vague here!

Setting --vfs-read-ahead large will mean rclone downloads that much of the file speculatively (while the file remains open). This works great for streaming but it isn't the right thing to do in all cases.

As said above, I'd take the defaults then play around a bit. You could try --buffer-size 0 also.

Everything is happening concurrently in rclone which makes simple explanations hard! What happens is something like this

  • the app opens the file at position 0
  • rclone requests the file from the server if it doesn't have that part of it on disk
  • rclone tries to keep --buffer-size of the file in memory
  • meanwhile rclone writes the data to disk but no more than --vfs-read-ahead from the place the application has read to.

If the application closes the file or seeks then rclone flushes all the buffers to disk and starts from the new position.

Hope that helps!

I just mount as a disk for daily usage, I know more about VFS now, it's really helpful, thanks!

This means that the file will be cached into the disk, only while opening? If my network bandwidth is fast enough and a smaller --vfs-read-ahead is set, after the first write task, when will it start to write again? For example:

  • we have --buffer-size=16M and --vfs-read-ahead=64M
  • User opened file, rclone start to read file position between to 0 and 16M into memory, meanwhile read file position between to 0 and 64M into disk, user start to read the file from 0 to 15M, then will rclone read more data into memory to keep allocated memory full? Or now it won't do anything until user start to read the position 16M + 1?
  • Assume rclone cached 0 to 32M into memory, while cached 0 to 18M into disk
  • User closed file, rclone flush 18M+1 to 32M into disk
  • User open the file again and start to read 0 to 16M, this part already cached into the disk, what will happen now? Will rclone write data from the cache on the disk into memory? Or re-request them from the server? Or just won't write anything into memory now?

Rclone will read more data into memory to keep the allocated --buffer-size memory full. This will be limited by the speed of the network.

Rclone will read the data from the disk and pass straight to the user without caching it in memory.

The OS will be caching the file reads so there is no point rclone caching them twice.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.