Max Pages on Linux with rClone/MergerFS

Animosity022 · November 2, 2019, 11:59am

@trapexit - I saw you had a pull request for this:

which moves the 128K up to 1M, but I wasn't sure how to take advantage of that. Is that something that both mergerfs and rclone can take advantage of a kernel newer than 4.20?

That would seem to greatly speed up sequential reads as it would remove a number of calls.

trapexit · November 2, 2019, 1:58pm

It needs support in the underlying communication to the kernel. mergerfs will inform the kernel to use 1MB if the kernel says it's supported. It's configurable to allow comparison and testing.

rclone would need to explicitly support it. Not sure what it uses to talk to the kernel but that would need to set the correct flags at the FUSE init stage (and allocate enough memory for incoming buffers).

Animosity022 · November 2, 2019, 2:04pm

@ncw - is that something to look at for rclone?

ncw · November 2, 2019, 3:00pm

Rclone uses

as its fuse library on Linux. I have to say it is looking a bit unmaintained with only 2 commits since 2016

It does not appear to allow setting of max_pages.

The alternative library is this one and there is an issue by our very own @darthShadow about this

I have an experimental branch making rclone work with go-fuse which almost works!

darthShadow · November 2, 2019, 4:30pm

Yeah, I had a bit of an experiment about this earlier.

Noticed the max_pages commit when it was pushed to mergerfs and was searching about how to use it in rclone. Figured that it was something that needed to be supported in the fuse backend used by rclone so went looking for it in the corresponding library.

Saw that bazil/fuse was not being maintained so again went looking through the alternative libraries available for rclone. Found the go-fuse library and created this issue there. Unfortunately, haven't been able to make progress there since I am not sure what flag to pass and the author seems stumped too.

Perhaps @trapexit could provide some help here?

trapexit · November 2, 2019, 10:18pm

github.com

trapexit/mergerfs/blob/master/libfuse/include/fuse_kernel.h#L289


#define FUSE_AUTO_INVAL_DATA	(1 << 12)
#define FUSE_DO_READDIRPLUS	(1 << 13)
#define FUSE_READDIRPLUS_AUTO	(1 << 14)
#define FUSE_ASYNC_DIO		(1 << 15)
#define FUSE_WRITEBACK_CACHE	(1 << 16)
#define FUSE_NO_OPEN_SUPPORT	(1 << 17)
#define FUSE_PARALLEL_DIROPS    (1 << 18)
#define FUSE_HANDLE_KILLPRIV	(1 << 19)
#define FUSE_POSIX_ACL		(1 << 20)
#define FUSE_ABORT_ERROR	(1 << 21)
#define FUSE_MAX_PAGES		(1 << 22)
#define FUSE_CACHE_SYMLINKS	(1 << 23)
#define FUSE_NO_OPENDIR_SUPPORT (1 << 24)


/**
 * CUSE INIT request/reply flags
 *
 * CUSE_UNRESTRICTED_IOCTL:  use unrestricted ioctl
 */
#define CUSE_UNRESTRICTED_IOCTL	(1 << 0)

github.com

trapexit/mergerfs/blob/master/libfuse/include/fuse_kernel.h#L631




struct fuse_init_out {
	uint32_t	major;
	uint32_t	minor;
	uint32_t	max_readahead;
	uint32_t	flags;
	uint16_t	max_background;
	uint16_t	congestion_threshold;
	uint32_t	max_write;
	uint32_t	time_gran;
	uint16_t	max_pages;
	uint16_t	padding;
	uint32_t	unused[8];
};


#define CUSE_INIT_INFO_MAX 4096


struct cuse_init_in {
	uint32_t	major;
	uint32_t	minor;
	uint32_t	unused;

Just like any of the other negotiated features you'd need to check in INIT if FUSE_MAX_PAGES is available and if so you would then set it in the flags of the init_out struct and set the max_pages field.

ncw · November 4, 2019, 5:35pm

FYI I revamped the go-fuse interface to rclone. You can try it here with rclone mount2.

Any testing much appreciated! This will only work on linux and macOS (not on freebsd).

https://beta.rclone.org/branch/v1.50.1-023-gd0ba9379-mount2-v2-beta/ (uploaded in 15-30 mins)

go-fuse seemed quite a bit quicker in my testing as it does async reads and writes.

I can probably add that to rclone mount too...

darthShadow · November 5, 2019, 5:59am

Did the builds fail for Linux?

ncw · November 5, 2019, 8:12am

Yes it looks like they did! That will take a little while to fixup! The tests all pass locally so not sure why they didn't on the CI! https://github.com/rclone/rclone/runs/287962101

darthShadow · November 11, 2019, 7:42pm

Looks like the tests finally succeeded? https://github.com/rclone/rclone/commit/48de5eb1bd4dd20becaf76fcd0464bd7efff0c47/checks?check_suite_id=305489468

Excellent work fixing the issues. Will try the new mount out soon. Any suggestions regarding the config flags that you think may require changes with mount2?

Current flags:

   --allow-other \
   --dir-cache-time=96h \
   --drive-chunk-size=64M \
   --vfs-read-chunk-size=128M \
   --vfs-read-chunk-size-limit=2G \
   --vfs-cache-mode=writes \
   --vfs-cache-max-age=30m \
   --vfs-cache-poll-interval=5m \
   --poll-interval=30s \
   --cache-dir=/home/darthshadow/rclone/vfs/TD/ \
   --buffer-size=512M \
   --attr-timeout=1s \
   --umask=002 \
   --log-file=/home/darthshadow/rclone/rclone-td.log \
   --log-level=INFO \
   --stats=30s \
   --stats-log-level=NOTICE \
   --rc \
   --rc-addr=127.0.0.1:5573 \

ncw · November 11, 2019, 9:52pm

Yes, here is rclone with a mount2 command for testing

https://beta.rclone.org/branch/v1.50.1-036-g9aa70582-mount2-v2-beta/

Most of those flags are for the VFS layer which is unchanged. Potentially --attr-timeout might need tweaking, but 1s is the default value so probably not. I haven't tried --allow-other so that may be broken I think it should be pretty much 100% compatible flags wise, just change mount to mount2!

darthShadow · November 19, 2019, 7:13am

I have been having pauses with the new async reads feature of VFS. They seem to be coinciding with the waits for 5ms for the in-order read to come it. This seems to be causing issues with the smooth playback of files from the mount.

Do you think 5ms is a little too long to wait for an in-order read?

PS: This is with default mount. Haven't tried mount2 yet.

darthShadow · November 19, 2019, 1:16pm

Following are the results of transferring a single file via rsync from the mounts of the specified versions. Different files were transferred so the size may vary a little.

Mount Command:

rclone mount PersonalTD:Media /home/darth/TDMedia --allow-other --dir-cache-time=96h --drive-chunk-size=64M --vfs-read-chunk-size=128M --vfs-read-chunk-size-limit=2G --vfs-cache-mode=writes --vfs-cache-max-age=30m --vfs-cache-poll-interval=5m --poll-interval=30s --cache-dir=/home/darth/rclone/vfs/TD/ --buffer-size=512M --attr-timeout=1s --umask=002 --log-file=/home/darth/rclone/rclone-td.log --log-level=INFO --stats=30s --stats-log-level=NOTICE --rc --rc-addr=127.0.0.1:5573

v1.50.1-mount

rclone v1.50.1
- os/arch: linux/amd64
- go version: go1.13.4

2019/11/19 11:13:31 NOTICE:
Transferred:        8.076G / 8.076 GBytes, 100%, 34.572 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:     3m59.2s

No Errors

v1.50.1-053-g1e627855-mount

rclone v1.50.1-053-g1e627855-beta
- os/arch: linux/amd64
- go version: go1.13.4

2019/11/19 11:25:46 NOTICE:
Transferred:        9.743G / 9.743 GBytes, 100%, 15.035 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:     11m3.6s

Had multiple errors: ReadFileHandle.Read error: low level retry 1/10: EOF

v1.50.1-06-g48de5eb1-mount

rclone v1.50.1-036-g48de5eb1-mount2-v2-beta
- os/arch: linux/amd64
- go version: go1.13.4

2019/11/19 11:25:46 NOTICE:
Transferred:        9.462G / 9.462 GBytes, 100%, 14.520 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:     11m7.2s

Had multiple errors: ReadFileHandle.Read error: low level retry 1/10: EOF

v1.50.1-06-g48de5eb1-mount2

rclone v1.50.1-036-g48de5eb1-mount2-v2-beta
- os/arch: linux/amd64
- go version: go1.13.4

2019/11/19 11:46:24 NOTICE:
Transferred:        8.371G / 8.371 GBytes, 100%, 35.709 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:        4m0s

No Errors

Another observation is that if more than one process is accessing the same file in the mount2, it bring all the existing transfers of that file to a crawl. (~40 MB/s to ~5 MB/s)

ncw · November 19, 2019, 1:41pm

Interesting...

The penalty of a 5ms wait for a read is offset by having to re-open the stream from scratch which takes more like 20ms for me. Rclone will end up doing this three times potentially...

1
2
4 - seek and re-open
3 - seek and re-open
5 - seek and re-open
6

Did the block come in within the 5ms deadline or did rclone do seeking? What does the log say?

Do you think your speed variations are due to the out of order blocks? Can you see seeking in the logs in the slow cases?

I rebased and updated the mount2 branch here:

https://beta.rclone.org/branch/v1.50.1-056-gd1170db3-mount2-v2-beta/ (uploaded in 15-30 mins)

Is that the case for mount as well? Is there lots of seeking going on?

darthShadow · November 26, 2019, 6:30pm

There were quite a lot of seeks in the log when I checked, yes.

Unfortunately, this is proving difficult to reproduce and I have already wiped the original log. Will spend some time this weekend, work permitting, to see if I can reproduce this.

ncw · December 3, 2019, 2:19pm

Did you have any luck with this? I'm just wondering whether to revert the async changes for mount or not...

darthShadow · December 3, 2019, 2:50pm

Unfortunately, no, I was not able to reproduce the speed issue. I am planning to switch over one of my regular mounts to this build with mount2 and hope that it will get reproduced eventually.

darthShadow · February 10, 2020, 5:03am

Hi,

I have sent a PR for the changes for Max Pages, currently with the bazil fuse version. Corresponding PR there is https://github.com/bazil/fuse/pull/237

I have also included the iowait branch changes since I wanted to test with async_read enabled and disabled to see if there were any issues either way.

The speeds seem to be better for me but more testing will be helpful.

ncw · February 10, 2020, 10:25am

Nice one!

Do you think I should merge these to a point release?

Maybe with the wait time increased to 20 or 50ms?

Measurements please!

darthShadow · February 10, 2020, 11:00am

Probably would be good to merge them into a point release so that people can test what works for them and we can arrive at sane defaults for the same. 20 ms may be a drastic change without getting any solid data from the users.

I think we should just double it to 10ms for and let the users chime in with their observations about what works for them to tune it any further.

Any thoughts on what tests should be performed? I was trying out tests with Drive as the remote and was getting increases from 80 MB/s to 110-120 MB/s but that may also be due to varying speeds with Google.

I would ideally like to get some test cases that can be repeated across various setups and various users to see the benefits.

I was thinking about a local mount with rclone. Would that be reasonable?

Possible command for a local mount:

rclone mount ~/max-pages/source ~/max-pages/test-mount --stats=30s --log-level=INFO --buffer-size=512M --async-read=true --vfs-read-wait=20ms --max-read-ahead=1M --max-pages=256

PS: Now that max-write has increased to 1M, I have noticed improvements with setting max-read-ahead to 1M too (from the default 128k), but this may be just placebo.

Is there any way to get binaries built for the PR so I can direct a few folks to it for testing?