Max Pages on Linux with rClone/MergerFS

Thanks for the suggestion. Checked that too, it seems correct and reflects any changes in the max pages setting appropriately.

Is there any way to get debug statements in mergerfs that show the read & write sizes?

I just want to be sure that this is something related to the fuse implementation and not something in my system that's causing it. Will give me a specific area to debug further.

-d will run it in the foreground and spit out debug details. It's the standard libfuse debug info. Been meaning on rewriting to make it more useful but should give you the details you want.

1 Like

2 observations that seem questionable to me:

  • cp gives writes of 128k whereas rsync gives writes of 256k. Could this be some limitation with the program used instead of the fuse library itself? I was able to replicate the same behavior on both the rclone & mergerfs mounts.
  • Even though rsync has double the write size, the time taken by it is double the time taken by cp. This was observed only with the mergerfs mount and not with the rclone mount. I am not sure how this is happening...

Mergerfs Mount Command : mergerfs -d -o async_read=true -o fsname=test-mount /home/darthshadow/max-pages/source=RW /home/darthshadow/max-pages/test-mount

Mergerfs Version:

mergerfs version: 2.28.3
FUSE library version: 2.9.7-mergerfs_2.29.0
fusermount version: 2.9.7
using FUSE kernel interface version 7.29

Commands for cp & rsync:

darthshadow@server:~/max-pages$ time rsync 1G.img test-mount/

real    0m2.468s
user    0m2.938s
sys     0m0.649s
darthshadow@server:~/max-pages$ time cp 1G.img test-mount/

real    0m0.962s
user    0m0.009s
sys     0m0.369s

Have you straced rsync/cp to see what sizes they are using to write? Best to check using dd and explicitly setting obs to the size you want per write call. 128K has traditionally been about the sweet spot for copying. With FUSE given the increased latency especially it is less so.

mergerfs doesn't (currently) support FUSE writeback caching. You'd have to use master branch for that and enable it. That would make the kernel batch upto 1MB worth of writes and then send it to mergerfs. Details are in the docs.

BTW... if you have caching enabled (which you do since you aren't disabling it using cache.files=off or direct_io=true) then it is very likely you're getting a getxattr request after every write which will seriously harm performance. Unfortunately, there isn't a good way to handle it. The kernel doesn't yet cache results. mergerfs has security_capability=false which can short circuit it so it doesn't go to the underlying filesystem but it only helps so much. The best is xattr=nosys but that turns off xattr's all together. No in between right now.

Thanks for the suggestions. strace did reveal that cp & rsync were sending the 128k & 256k writes. rsync also had some select blocking which could explain the increased copy time even with the bigger write size.

dd is sending the expected read and write requests based on the params specified so this is ideal for more testing now.

Unfortunately, however, I didn't notice any significant different in the write speeds with & without max-pages on both rclone & mergerfs, however, mergerfs was showing almost 2x the speeds of rclone (and could probably go even faster, since my drive write throughput got maxed out at those speeds).

Reads seemed to be similar with & without max-pages in mergerfs but this is probably because my drive read throughput is maxed out at those speeds.
rclone was slightly slower than mergerfs but still had a 3x-4x throughtput increase than without the max-pages.

1 Like

What does is -ld /mnt/mount look like? Does it look the same as when you use rclone mount?

Ah, that is broken... I'll investigate.

Rclone doesn't currently support xattrs so that might be a good option.

Unfortunately, the side effect right now is that turning xattrs off means no runtime config and loss of certain other features in mergerfs. My roadmap includes finding alternative ways to offer the same features given the general impact xattrs can have.

I was able to try the experiments on a significantly faster disk which shouldn't have the throughput bottlenecks and the results appear to be similar:

MergerFS:

Read & Write Speeds of ~ 800 MB/s - 1 GB/s with fuse_msg_size set to 32. ~ 100-200 MB/s improvement after setting it to 256.

RClone:

Read Speeds of 150-200 MB/s (with max-pages set to 32 or without it) and an increase to 400-500 MB/s (with max-pages set to 256). Write Speeds of 400-500 MB/s, both with & without max-pages.

@ncw I think we can open this for further testing once you merge the latest changes too for the builds.


PS: Is the 2x or greater difference between rclone & mergerfs (for both reads & writes) simply due to the fuse libraries or can something be done to improve the performance of rclone?

Good question... It is almost certainly due to excess data copying. A bit of careful profiling might reveal the problem! Go has excellent profiling tools.

Sounds good, I will spend some time the next few days to familiarize myself with those and see if there are any obvious bottlenecks. Any tips or guides you can recommend to get started?

In the meanwhile, this looks like a good enough performance boost to get started with for general testing by a few more folks.

Check out this bit of the rclone docs: https://rclone.org/rc/#debugging-rclone-with-pprof - that shows how to profile most things and there are some links elsewhere!

1 Like

I'd keep an eye out for cold spots. When dealing with IO the problems often aren't things typical profiling will find. It's often cold spots and latency. I've been meaning to do cold spot profiling of mergerfs for a while but haven't gotten around to it so unfortunately I can't offer you any practical suggestions.

Flamegraphs:

Read: https://drive.google.com/file/d/1m5nHayy07mXkJY_X14H3NQW10qtu0702/view
Write: https://drive.google.com/file/d/1GGix_KLzPRs4egM34T4M0xdvjpmk20Ep/view

Looks like ~75% of the time is spent calculating the MD5 Hash for writes and ~50% of the time for reads.

Ha! Try your tests with --ignore-checksum to stop rclone checking hashes.

Nice graphs! How did you make those - with something like this?

docker run uber/go-torch -u http://<host ip>:8080/debug/pprof -p -t=30 > torch.svg

Maybe I should put instructions into the rclone debugging section on how to do it.

Didn't change anything. Still the same results.

Yes.

I can add those as part of the PR, not a problem.

Sorry try --no-checksum

Not sure why there are two flags doing nearly the same thing... --no-checksum is a VFS flag

This seems to have helped with the read speeds and they are the same as the speeds with mergerfs now.

However, writes are unaffected with the majority of the time still being spent at the same md5 block.

You might need to use both of those flags --ignore-checksum and --no-checksum

Which VFS cache mode are you using?

Uploads use the Rcat primitive when not using --vfs-cache-mode writes and use the Copy primitive when using it.

It might be the Rcat primitive is ignoring --ignore-checksum...