Constantly high IOWAIT (add log)

omg I expierenced kinda similiar issue first with my dedicated server and the rtorrent MergerFS Mount with an underlying Rclone mount and now also on my Homeserver using Plex directly on the Rclone mount and I just debugged it down to the same cause, unnecessary high IOwait. I already gave up until I just thought of checking here.

Linux deskmini 5.4.0-0.bpo.4-amd64 #1 SMP Debian 5.4.19-1~bpo10+1 (2020-03-09) x86_64 GNU/Linux

ExecStart=/usr/bin/rclone mount "x-gd:/" /mnt/google/x-gd \
   --allow-other \
   --attr-timeout 1000h \
   --buffer-size 32M \
   --dir-cache-time 1000h \
   --drive-chunk-size 32M \
   --log-level INFO \
   --log-file /home/scripts/logs/mount-x.log \
   --poll-interval 15s \
   --rc \
   --rc-addr 127.0.0.1:5573 \
   --stats 0 \
   --timeout 1h
   --use-mmap

What would be the smartest idea to do here? Reverting to 1.50 settings or adjusting --vfs-read-wait?

update after raise from 40ms to 50ms.

looks bad. I think...

--async-read=false for now resolves it. 0 IOwait.

Will wait for now I think...

IOWait isn't a good measure in my opinion! If you use --async-read=true (the default) you are going to get IOWait, but you will get faster performance provided you don't see those failed to wait for in-sequence read messages. Those are what really kill the performance. If you don't want to see IOWait then set --async-read=false and it will all disappear along with some performance.

is i possible to find an option with --async-read=true without failed to wait for in-sequence read messages? Should I raise --vfs-read-wait higher than 50ms? Any Ideas?

Yes, keep raising it until you don't get those messages. It will help a bit with the IOWait but not a lot for the reasons above.

I raised it till --vfs-read-wait 1000ms and still got the failed to wait for in-sequence read Message in debug log. (40 times in 1 minute under high load)

:frowning:

I think it is time for plan B.

Using --async-read=false will fix the problems at the cost of some performance. Meanwhile I'll warm up the proper fix I did and post a beta here.

2 Likes

Hello everyone.
Running 1.51 on an Ubuntu server and have the exact same issue.
Few stupid question:

  • Why do I see IO Wait on Netdata but nothing on iotop -o ?
  • Is rclone writing constantly on the disk with this bug ? --> I have expensive NVMe, should I better downgrade or is it safe to continue run this version until there is a patch ? I mean I don't want this bug causing an infinite write on my NVMe until the patched version.

Thanks :slight_smile:

pass!

No it is waiting for the network.

iotop is showing active disk utilization and what is consuming active disk IO.

netdata is showing IO Wait, which is a separate measure of a process waiting for disk IO to complete. Anything in IO would eventually show in iotop as being IO if that makes sense.

Thanks for the clarification !
I will wait for the next version then, nothing to worry on my side :slight_smile:
Have a nice day

1 Like

I have posted the latest beta with a fixes for this

https://beta.rclone.org/v1.51.0-336-g951099db-beta/

I've raised the read timeout to 20ms and along with another fix I think this should be much better.

Comments appreciated!

If im using 1.51 non-beta, to avoid this I have to use async-read=false in rclone mount options? And also in mergerfs?

Yes that is correct

Probably wouldn't hurt but not 100% sure.

I thought that option was not in 1.51 and you had to use a beta?

You are qutie correct! You'll need the beta for --async-read=false

Why I don't have issues without messing with this async stuff? Sometimes I get 200 or more files open in the mount, and I haven't noticed any issues at all.

And I'm using rclone betas, and kernel 5.6.4

The issue was with stock 1.51 as that defaulted to turning on async reads and there were fixes that went into the beta to 'smooth' it out. It can also be the settings you have are reading such small increments, you are not seeing the issue.

You'd probably want to compare IOWAIT on 1.50.2 and 1.51 and see if you have any changes in it.

Oh I see. So now if I'm using rclone 1.51, besides using the beta, are there any other flags I can add to the rclone and mergerfs file to mitigate this issue, or is rolling back to 1.50 the only way?