Max Pages on Linux with rClone/MergerFS

I will also be submitting a similar PR to the go-fuse library for the mount2 variant, so, can you please rebase that branch with the latest master so that I can test that out too?

Is there anything holding you from adding mount2 as a separate option to the mainstream builds?

Do you think it would help with the iowait's too ?

This is just one of the various tests performed:

Without the max pages patch:

26,224,173,044 100% 397.26MB/s 0:01:02 (xfr#1, to-chk=0/1)

With the max pages patch:

26,224,173,044 100% 509.32MB/s 0:00:49 (xfr#1, to-chk=0/1)

The mount command was the same as the above, only the binary was changed.

I've pushed it to go-fuse-mount

There will be a binary here

https://beta.rclone.org/branch/v1.51.0-022-gd10cf5fe-go-fuse-mount-beta/ (uploaded in 15-30 mins)

I'd switch from bazil to go-fuse apart from the inconvenient fact that go-fuse doesn't support FreeBSD :frowning:

I haven't figured out what to do about that yet!

I think go-fuse has the same problem - it is where I first noticed it.

The problem is that the kernel delivers out of order reads if you set the async flag.

Nice!

Last time I looked through the kernel source max-read-ahead was limited to 128k. However you should be able to see if it is effective by running with -vv and seeing whether the Read calls ask for more than 128k of data.

Just to keep it consolidated in 1 place, the build for this is available from: v1.51.0-014-g6a86ec70-pr-3949-max-pages-beta

Thanks. Expect a PR for that too in the next few days.

Isn't it possible to add it as a separate option as mount2 on the OS'es it already supports and push that change to master? Or do you see any issues with that approach?

That's disappointing. I had hoped that the fuse library would have handled it better without needing to tweak it so much from rclone, with the read & write wait time(s).

Any suggestions or improvements for the performance testing and the mount command? Or is testing via the local mount the best judge for now?

Will do.

Sure that is possible. I didn't do it because of code bloat and general user confusion! Do you think I should? I could make the mount2 command hidden.

Testing via the local disk is what I usually do.

That sounds like a good idea. This will avoid needing to keep the branch constantly up to date and those who want can try both with minimal changes.

Adding the mount2 commands adds 524k to to the rclone binary which is 1.3% so that is probably acceptable!

If this builds OK with mount2 hidden then I'll merge it

https://beta.rclone.org/branch/v1.51.0-024-g02afb74b-go-fuse-mount-beta/ (uploaded in 15-30 mins)

1 Like

Can I test this new mount2 option with Debian 10.3 (Kernel 4.19) too? Would this speedup my mounts?

@ncw I was going through this thread and trying to understand the logic for async reads.

I had a small question, if you don't mind, do you think there would be any advantage to serializing the requests after an aborted or successful read like suggested and implemented in that thread?

The rclone implementation works pretty much like that one I think - the use of Cond locks shortens the code a lot so you don't need a central manager of the requests.

I made another implementation (not released yet) which serializes all the read requests into a goroutine which makes for simpler code (less locking!) but it has the great advantage that it doesn't need to wait until the IO has finished before figuring out that the offset is correct, so the io wait times can be shorter. That is much more like the serialize_reads in that thread.

If you turn on --fuse-debug you can see the problem, the reads come in in the wrong order but usually separated by <0.1 ms

I just merged it to the latest beta (it will be there in 15 mins or so).

It should work find on your machine and it will probably be faster - interested in your feedback.

Yeah, I did notice that which was why I had suggested earlier in the thread to reduce the value by default. I just never read the code to understand that it waits until the whole read is completed to unlock the others.

Should be an excellent improvement w.r.t the iowait. Well done as always.

1 Like

I've done some tests on my VPS...

mount options

--allow-other \
--umask 0007 \
--uid 120 \
--gid 120 \
--use-mmap \
--buffer-size 2G \
--fast-list \
--dir-cache-time 96h \
--drive-chunk-size 128M \
--vfs-cache-max-age 4h \
--vfs-cache-mode writes \
--vfs-cache-max-size 50G \
--vfs-read-chunk-size 128M \
--vfs-read-chunk-size-limit off \

rclone v1.51.0

time mediainfo via mount Test-File1
real    0m1.778s
user    0m0.117s
sys     0m0.042s

time mediainfo via mount Test-File2
real    0m1.429s
user    0m0.350s
sys     0m0.021s

rsync copy via mount Test-File1
79,754,881,798 100%  103.07MB/s    0:12:17 (xfr#1, to-chk=0/1)

rsync copy via mount Test-File2
 6,064,000,647 100%   55.34MB/s    0:01:44 (xfr#1, to-chk=0/1)

copy rclone Test-File1 (--transfers 16)
Transferred:       74.278G / 74.278 GBytes, 100%, 341.912 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:      3m42.4s

copy rclone Test-File2 (--transfers 16)
Transferred:        5.648G / 5.648 GBytes, 100%, 160.473 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        36.0s
second one goes faster... coincidence?
Transferred:        5.648G / 5.648 GBytes, 100%, 370.925 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        15.5s

rclone v1.51.0-027-gec127181-beta

time mediainfo via mount Test-File1
real    0m1.818s
user    0m0.114s
sys     0m0.035s

time mediainfo via mount Test-File2
real    0m1.314s
user    0m0.297s
sys     0m0.018s

rsync copy via mount Test-File1
79,754,881,798 100%  101.23MB/s    0:12:31 (xfr#1, to-chk=0/1)

rsync copy via mount Test-File2
 6,064,000,647 100%   58.34MB/s    0:01:39 (xfr#1, to-chk=0/1)
 
rclone copy Test-File1 (--transfers 16)
Transferred:       74.278G / 74.278 GBytes, 100%, 343.332 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:      3m41.5s

copy rclone Test-File2 (--transfers 16)
Transferred:        5.648G / 5.648 GBytes, 100%, 322.325 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        17.9s

I think the test says not so much because of the local storage limitation.
Would be nice if someone with a dedicated-root can test this scenario.

One question: why is copy via vfs-mount so "slow"? Is it because rclone handle these stream connection via mount in an other way?

It does look like the max-pages has allowed the read-ahead to be increased to 1MiB.
Without it or with it set to 32 (i.e. the default value), I am seeing fuse reads of 131072 (i.e. 128k), as expected, in the logs. With it set to 256, the value is 1048576 (i.e. 1 MiB)


Some performance numbers with the bazil/fuse library:

For 1G File:

Without Max Pages (with 128k Reads)

darthshadow@server:~/max-pages$ time cp test-mount/1G.img .

real    0m15.251s
user    0m0.005s
sys     0m0.524s

With Max Pages (with 1M Reads)

darthshadow@server:~/max-pages$ time cp test-mount/1G.img .

real    0m3.668s
user    0m0.008s
sys     0m0.515s

For 10G File:

Without Max Pages (with 128k Reads)

darthshadow@server:~/max-pages$ time cp test-mount/10G.img .

real    2m25.094s
user    0m0.092s
sys     0m5.416s

With Max Pages (with 1M Reads)

darthshadow@server:~/max-pages$ time cp test-mount/10G.img .

real    0m32.152s
user    0m0.051s
sys     0m4.724s

The numbers seem to be pretty constant across multiple runs and show almost a 5x improvement w.r.t read speeds.


However, I have been unable to achieve similar results for my write tests. The writes seem to still be limited to 128k for some reason even though the parameters appear to be set properly. Is there any buffer or chunk size in rclone that may be effecting it?

There isn't that much difference between them is there?

You could try rclone mount2 in the latest beta to see if there is a difference.

That should be something that @darthShadow's work helps with.

Great :slight_smile:

That is a lot of difference!

max pages should be working for both reads and write I think. Did you change both the send path and the receive path in the bazil library (I didn't check your patch and that question may make no sense!).

Yeah, I would have thought the same.

There isn't anything to change per request. This is done only once in the initial negotiation with the kernel so this is particularly surprising.

My next step is to test the same with go-fuse instead and see if it works there. If it does, there's additional changes required for the bazil/fuse version. If it doesn't, perhaps I am missing something else that needs to be done too. Let's see...

1 Like

about rclone mount2:
when I change to mount directory I got an error.
-bash: cd: /mnt/mount/: Permission denied
ls, mv, or cp commands are possible

with rclone mount all work well.

edit: uid and gid are set to rclone user/group. my personal user is member of this group.

edit2: with command df I got disk information about rclone mounts. When I use mount2 I don't get this... Some of my apps check this disk informations. e.g. Nextcloud. Result: Nextcloud shows failure message "disk full"

I have pushed the changes for go-fuse too. The read performance has similar improvements with max-pages cranked up but writes are still 128k.

I am out of ideas about what else may be causing this issue.

@trapexit Is there anything else that is required to be done for increasing the writes to 1 MiB instead of 128 KiB? The reads have increased to 1 MiB after increasing the max pages to 256 but the writes are still 128k. And yes, big writes are already enabled.

The only thing that I can think of is ensuring that fuse_init_out.max_write is appropriately set.