Rclone mount read performance on Windows

What is the problem you are having with rclone?

Relatively slow read performance from S3 compared to known-to-be-multithreaded readers such as aws s3 cp on the same host, and goofys on linux. Have tried a lot of different configurations with vfs full and vfs off, can't seem to break 80MBps which is roughly a third of what aws s3 cp gets with 10 threads and 10MB read sizes.

Have also tried providing ramdisk as VFS cache dir, as to reduce latency and possible ratelimiting of "local" disk (is EC2 instance and local drives are EBS volumes with burst capacity).

In watching network activity using Resource Monitor, am seeing multiple rclone threads at times but only one appears to doing any significant reading from AWS.

Am aware that multiple read threads are needed to get high bandwidth from S3, and have noticed that the --async-read flag is listed as not supported for Windows.

Would be interested in helping add such support if this is the case, and the underlying WinFSP supports such.

What is your rclone version (output from rclone version)

PS Z:> rclone --version
rclone v1.54.0

  • os/arch: windows/amd64
  • go version: go1.15.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows Server 2016 64-bit

Which cloud storage system are you using? (eg Google Drive)

AWS S3
Compute is in same region

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Example vfs config, one of many tried:

mount foo:bar/baz S:  --dir-cache-time 1000m  --attr-timeout 1000m  --no-checksum  --no-modtime
  --async-read  --fast-list  --buffer-size 2000M  --vfs-cache-mode full  --vfs-read-chunk-size 16M  --vfs-read-chunk-size-limit 500M  --vfs-read-ahead 500M  --vfs-read-wait 5ms  --vfs-cache-max-size 99000M  --vfs-cache-max-age 30m  --cache-dir Z:/rcloneCache2

Example non-vfs config:

mount foo:bar/\baz S:  --dir-cache-time 1000m  --attr-timeout 1000m  --no-checksum  --no-modtime  --async-read  --fast-list  --vfs-cache-mode off  --buffer-size 1G

The rclone config contents with secrets removed.

[foo]
type = s3
provider = AWS
env_auth = true
region = us-east-1
acl = private

A log from the command with the -vv flag

Here's a snippet, can see my test app reading blocks of 80KB and ChunkedReader.read on 1MB length (seems to be consistent no matter what settings I try). No indication of threaded reads (but maybe such would happen inside ChunkedReader.read?

2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=327680, fh=0x0
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 1044480 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=409600, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=491520, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=573440, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 2093056 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=655360, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=737280, fh=0x0
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 3141632 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 4190208 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=819200, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=901120, fh=0x0
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 5238784 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=983040, fh=0x0
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: >Read: n=81920
2021/02/11 21:32:19 DEBUG : 09DEC16101133-M2AS-200000269988_01_P001.TIF: ChunkedReader.Read at 6287360 length 1048576 chunkOf
fset 0 chunkSize 134217728
2021/02/11 21:32:19 DEBUG : /09DEC16101133-M2AS-200000269988_01_P001.TIF: Read: ofst=1064960, fh=0x0```

Thanks in advance for any tips!

Have tried every credible-looking solution for Windows that I can find, have reached out to various vendors, not finding a suitably fast solution yet.

Again, am willing to help make rclone faster for Windows if it's within reach. I haven't gotten deep into the code yet but have done a bit of S3, Golang, and performance-oriented sw dev.

Another alternative I'm considering is porting https://github.com/kahing/goofys to Windows using WinFSP or CrossMeta FUSE, but that feels more like going out on a limb.

Also more backstory - am needing something like rclone mount instead of copy due to large size of objects/files to be read. Would be a large latency hit to have to copy large files before first bytes can be read.

Am also engaging application vendors about them adding native/performant S3 integration but need an interim solution.

You can post your interest here (or even help implement it) for multi-threaded downloads to mount:

How rclone reads your data depends a great deal on what your application does.

By default if you open a file and read it sequentially, then rclone will read it sequentially too - (see the issue @darthShadow posted above).

If your app reads from multiple places at once in the file then rclone will open one download thread for each place you are reading from if using --vfs-cache-mode full.

If you use --vfs-cache-mode off then every time your app seeks, rclone will close the stream and re-open it at the seek point - it won't do concurrent downloads.

It might be worth you trying

rclone copy -vv s3:bucket/bigfile c:\dir

That does do multithreaded downloads by default - you can adjust the number of streams with --multithread-streams - it would be interesting to compare that with aws s3 cp and goofyfs.

Good info!

To my knowledge our app behavior is predominantly sequential reads at present to one large file at a time.

On > If your app reads from multiple places at once in the file then rclone will open one download thread for each place you are reading from if using --vfs-cache-mode full .

Would the app have to have two different file descriptors/handles open for this to be the case, or does rclone figure out that non-contiguous reads are happening?

@darthShadow - In the thread you referenced, it mentions parallel reads being removed when other features were added:

this was present in the previous iteration of the full-mode cache but was removed to add support for partial downloads/streams

So the features weren't fundamentally at odds with each other, just a matter of finite dev resources and having to prioritize one feature over the other?

@ncw - I did try rclone copy and it handily beats the best aws s3 cp numbers I can get (350MBps vs 230MBps). The copy use case comes up often for other projects and this is great to know about. :slight_smile:

I did some testing with rclone 1.52.3 and am seeing rclone itself read fast/threaded but my app doesn't appear to get any data back until the entire file is read.

Ahh so the ability to do so came with the 'partial downstream` support later on (1.53.0?).

yes, that is correct.

v1.53.0 was a major update to vfs including adding sparse files.

Rclone 1.53 release
"This enables partial caching in --vfs-cache-mode full"

OK, so you'll just be getting one download stream.

No. If it is seeking, rclone will open new downloaders, one for each seek if they are sufficiently far from any other downloaders.

--vfs-cache-mode full used to download the entire file using the equivalent of rclone copy before letting the user read the file. This was great for download speed, but not so good for interactivity (as in you had to wait for the entire file!).

Great - good to know :slight_smile:

Yes, that is the way it works

That's correct.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.