Rclone generating kernel panics

Today multiple servers starting having kernel panics again. This used to happen a long time ago and then just as it came it went away.

rclone v1.64.0
- os/version: ubuntu 18.04 (64 bit)
- os/kernel: 5.4.0-149-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.1
- go/linking: static
- go/tags: none

ExecStart=/usr/bin/rclone mount
--config=/home/user/.config/rclone/rclone.conf
--allow-other
--cache-dir=/mnt/cache
--vfs-cache-max-age=8766h
--vfs-cache-max-size=MAGICWORD
--vfs-cache-mode=full
--vfs-cache-min-free-space=2G
--vfs-read-chunk-size=0
--buffer-size=0
--vfs-read-ahead=256M
--vfs-fast-fingerprint
--no-checksum
--no-modtime
--dir-cache-time=8766h
--attr-timeout=1h
--timeout=10m
--umask=022
--low-level-retries=2
--retries=0
--log-level=INFO
--log-file=/opt/rclone.log
--async-read=true
--rc
--rc-addr=localhost:5572
cf: /mnt/remote

I'm using cloudflare r2 as my backend. MAGICWORD is replaced by my bash script to be the disk space of the drives - 100 gb

Weird because issue started to affect multiple servers, seemly randomly, so I discard any nvme disk issues.

kernel output that i'm not smart enough to parse

Oct 16 20:47:19 V09 kernel: Kernel panic - not syncing: hung_task: blocked tasks
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 31 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 30 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 29 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 28 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 27 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 26 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 25 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 24 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 23 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 22 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 21 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: FS:  00007f09d7fff700 GS:  0000000000000000
Oct 16 20:47:19 V09 kernel: R13: 000055e3acf2d938 R14: 000055e3ad14c26c R15: 0000000000000000
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000055e3adde0770
Oct 16 20:47:19 V09 kernel: RBP: 000055e3adde0960 R08: 00007f09d7ffea30 R09: 0000000000000000
Oct 16 20:47:19 V09 kernel: RDX: 00007f09d000df30 RSI: 00007f09d000df30 RDI: 00007f09d000df20
Oct 16 20:47:19 V09 kernel: RAX: 0000000000000000 RBX: 00007f09d000df30 RCX: 0000000000000010
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007f09d7ffe9a8 EFLAGS: 00000246
Oct 16 20:47:19 V09 kernel: Code: e0 c5 fd 74 0f c5 fd d7 c1 d3 f8 85 c0 74 1b f3 0f bc c0 48 01 f8 48 01 c8 48 29 d0 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 <48> 83 c7 20 c5 fd 74 0f c5 fd d7 c1 85 c0 0f 85 8c 00 00 00 c5 fd
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7f09ddd084b0
Oct 16 20:47:19 V09 kernel: Hardware name: Supermicro Super Server/X10DRL-i, BIOS 3.2 11/19/2019
Oct 16 20:47:19 V09 kernel: CPU: 20 PID: 1982 Comm: rs:main Q:Reg Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 20
Oct 16 20:47:19 V09 kernel:  ret_from_fork+0x35/0x40
Oct 16 20:47:19 V09 kernel:  ? kthread_park+0x90/0x90
Oct 16 20:47:19 V09 kernel:  ? process_one_work+0x400/0x400
Oct 16 20:47:19 V09 kernel:  kthread+0x121/0x140
Oct 16 20:47:19 V09 kernel:  worker_thread+0x34/0x410
Oct 16 20:47:19 V09 kernel:  process_one_work+0x20f/0x400
Oct 16 20:47:19 V09 kernel:  ? drm_fb_helper_dirty_work+0xfb/0x1a0 [drm_kms_helper]
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 16 20:47:19 V09 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 16 20:47:19 V09 kernel: CR2: 00007fcd14622c14 CR3: 00000017c860a003 CR4: 00000000003606e0
Oct 16 20:47:19 V09 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 16 20:47:19 V09 kernel: FS:  0000000000000000(0000) GS:ffff8a68ffac0000(0000) knlGS:0000000000000000
Oct 16 20:47:19 V09 kernel: R13: 000000000000005f R14: ffff8a68f364ac00 R15: ffff9d2d081e0000
Oct 16 20:47:19 V09 kernel: R10: ffff9d2e48701000 R11: ffff8a6535f54cc0 R12: ffff9d2e4845f000
Oct 16 20:47:19 V09 kernel: RBP: ffff9d2d06803e80 R08: ffff8a5eee7ec300 R09: ffff9d2d06803c20
Oct 16 20:47:19 V09 kernel: RDX: 0000000000001000 RSI: ffff9d2d081e0040 RDI: ffff9d2e4845f040
Oct 16 20:47:19 V09 kernel: RAX: ffff9d2e4845f000 RBX: ffff8a78f3bfead0 RCX: 0000000000000fc0
Oct 16 20:47:19 V09 kernel: RSP: 0018:ffff9d2d06803e38 EFLAGS: 00010246
Oct 16 20:47:19 V09 kernel: Code: ff ff ff 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
Oct 16 20:47:19 V09 kernel: RIP: 0010:memcpy_erms+0x6/0x10

Oct 16 20:47:19 V09 kernel: RIP: 0010:memcpy_erms+0x6/0x10
Oct 16 20:47:19 V09 kernel: Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
Oct 16 20:47:19 V09 kernel: Hardware name: Supermicro Super Server/X10DRL-i, BIOS 3.2 11/19/2019
Oct 16 20:47:19 V09 kernel: CPU: 19 PID: 129 Comm: kworker/19:0 Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 19
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 18 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 17 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 16 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 15 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 14 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 13 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 12 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 11 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 10 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 9 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 8 skipped: idling at poll_idle+0x8c/0xbd
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 7 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 6 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 5 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 4 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 3 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 2 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 0 skipped: idling at intel_idle+0x87/0x130
Oct 16 20:47:19 V09 kernel: Sending NMI from CPU 1 to CPUs 0,2-31:
Oct 16 20:47:19 V09 kernel:  ret_from_fork+0x35/0x40
Oct 16 20:47:19 V09 kernel:  ? kthread_park+0x90/0x90
Oct 16 20:47:19 V09 kernel:  ? hungtask_pm_notify+0x50/0x50
Oct 16 20:47:19 V09 kernel:  kthread+0x121/0x140
Oct 16 20:47:19 V09 kernel:  watchdog+0x2c6/0x500
Oct 16 20:47:19 V09 kernel:  arch_trigger_cpumask_backtrace+0x19/0x20
Oct 16 20:47:19 V09 kernel:  nmi_trigger_cpumask_backtrace+0xe7/0x130
Oct 16 20:47:19 V09 kernel:  ? lapic_can_unplug_cpu+0xb0/0xb0
Oct 16 20:47:19 V09 kernel:  nmi_cpu_backtrace+0x94/0xa0
Oct 16 20:47:19 V09 kernel:  dump_stack+0x6d/0x8b
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: Hardware name: Supermicro Super Server/X10DRL-i, BIOS 3.2 11/19/2019
Oct 16 20:47:19 V09 kernel: CPU: 1 PID: 210 Comm: khungtaskd Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: NMI backtrace for cpu 1
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c666e60 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? __do_page_fault+0x2be/0x4d0
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0
Oct 16 20:47:19 V09 kernel:  fuse_open+0x10/0x20
Oct 16 20:47:19 V09 kernel:  ? fuse_open_common+0x110/0x110
Oct 16 20:47:19 V09 kernel:  fuse_open_common+0x107/0x110
Oct 16 20:47:19 V09 kernel:  fuse_finish_open+0x7c/0x100
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2+0x17/0x20
Oct 16 20:47:19 V09 kernel:  ? finish_wait+0x56/0x70
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2_range+0x2e4/0x580
Oct 16 20:47:19 V09 kernel:  ? file_fdatawait_range+0x30/0x30
Oct 16 20:47:19 V09 kernel:  __lock_page+0x121/0x210
Oct 16 20:47:19 V09 kernel:  io_schedule+0x16/0x40
Oct 16 20:47:19 V09 kernel:  schedule+0x33/0xa0
Oct 16 20:47:19 V09 kernel:  __schedule+0x293/0x740
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: php-fpm         D    0 2767342   7151 0x00004000
Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu

Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767342 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c65f210 R09: 0000000000000000
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? __do_page_fault+0x2be/0x4d0
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0
Oct 16 20:47:19 V09 kernel:  fuse_open+0x10/0x20
Oct 16 20:47:19 V09 kernel:  ? fuse_open_common+0x110/0x110
Oct 16 20:47:19 V09 kernel:  fuse_open_common+0x107/0x110
Oct 16 20:47:19 V09 kernel:  fuse_finish_open+0x7c/0x100
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2+0x17/0x20
Oct 16 20:47:19 V09 kernel:  ? finish_wait+0x56/0x70
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2_range+0x2e4/0x580
Oct 16 20:47:19 V09 kernel:  ? file_fdatawait_range+0x30/0x30
Oct 16 20:47:19 V09 kernel:  __lock_page+0x121/0x210
Oct 16 20:47:19 V09 kernel:  io_schedule+0x16/0x40
Oct 16 20:47:19 V09 kernel:  schedule+0x33/0xa0
Oct 16 20:47:19 V09 kernel:  __schedule+0x293/0x740
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: php-fpm         D    0 2767333   7151 0x00004000
Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767333 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c685100 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  ? filename_lookup+0xf2/0x180
Oct 16 20:47:19 V09 kernel:  ? putname+0x4c/0x60
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0
Oct 16 20:47:19 V09 kernel:  fuse_open+0x10/0x20
Oct 16 20:47:19 V09 kernel:  ? fuse_open_common+0x110/0x110
Oct 16 20:47:19 V09 kernel:  fuse_open_common+0x107/0x110
Oct 16 20:47:19 V09 kernel:  fuse_finish_open+0x7c/0x100
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2+0x17/0x20
Oct 16 20:47:19 V09 kernel:  ? finish_wait+0x56/0x70
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2_range+0x2e4/0x580
Oct 16 20:47:19 V09 kernel:  ? file_fdatawait_range+0x30/0x30
Oct 16 20:47:19 V09 kernel:  __lock_page+0x121/0x210
Oct 16 20:47:19 V09 kernel:  io_schedule+0x16/0x40
Oct 16 20:47:19 V09 kernel:  schedule+0x33/0xa0

Oct 16 20:47:19 V09 kernel:  __schedule+0x293/0x740
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: php-fpm         D    0 2767327   7151 0x00004000
Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767327 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c666e60 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? __do_page_fault+0x2be/0x4d0
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0
Oct 16 20:47:19 V09 kernel:  fuse_open+0x10/0x20
Oct 16 20:47:19 V09 kernel:  ? fuse_open_common+0x110/0x110
Oct 16 20:47:19 V09 kernel:  fuse_open_common+0x107/0x110
Oct 16 20:47:19 V09 kernel:  fuse_finish_open+0x7c/0x100
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2+0x17/0x20
Oct 16 20:47:19 V09 kernel:  ? finish_wait+0x56/0x70
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2_range+0x2e4/0x580
Oct 16 20:47:19 V09 kernel:  ? file_fdatawait_range+0x30/0x30
Oct 16 20:47:19 V09 kernel:  __lock_page+0x121/0x210
Oct 16 20:47:19 V09 kernel:  io_schedule+0x16/0x40
Oct 16 20:47:19 V09 kernel:  schedule+0x33/0xa0
Oct 16 20:47:19 V09 kernel:  __schedule+0x293/0x740
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: php-fpm         D    0 2767324   7151 0x00004000
Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767324 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c666e60 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e

Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767324 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c666e60 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? __do_page_fault+0x2be/0x4d0
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0
Oct 16 20:47:19 V09 kernel:  fuse_open+0x10/0x20
Oct 16 20:47:19 V09 kernel:  ? fuse_open_common+0x110/0x110
Oct 16 20:47:19 V09 kernel:  fuse_open_common+0x107/0x110
Oct 16 20:47:19 V09 kernel:  fuse_finish_open+0x7c/0x100
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2+0x17/0x20
Oct 16 20:47:19 V09 kernel:  ? finish_wait+0x56/0x70
Oct 16 20:47:19 V09 kernel:  invalidate_inode_pages2_range+0x2e4/0x580
Oct 16 20:47:19 V09 kernel:  ? file_fdatawait_range+0x30/0x30
Oct 16 20:47:19 V09 kernel:  __lock_page+0x121/0x210
Oct 16 20:47:19 V09 kernel:  io_schedule+0x16/0x40
Oct 16 20:47:19 V09 kernel:  schedule+0x33/0xa0
Oct 16 20:47:19 V09 kernel:  __schedule+0x293/0x740
Oct 16 20:47:19 V09 kernel: Call Trace:
Oct 16 20:47:19 V09 kernel: php-fpm         D    0 2767319   7151 0x00004000
Oct 16 20:47:19 V09 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 16 20:47:19 V09 kernel:       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
Oct 16 20:47:19 V09 kernel: INFO: task php-fpm:2767319 blocked for more than 120 seconds.
Oct 16 20:47:19 V09 kernel: R13: 0000000000000000 R14: 00007fcd0c603a18 R15: 00007fcd0c6027a0
Oct 16 20:47:19 V09 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe916bfbe0
Oct 16 20:47:19 V09 kernel: RBP: 00007fcceced15e0 R08: 00007fcd0c666ed8 R09: 0000000000000001
Oct 16 20:47:19 V09 kernel: RDX: 0000000000000000 RSI: 00007ffe916bfbe0 RDI: 00000000ffffff9c
Oct 16 20:47:19 V09 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcd14622c3e
Oct 16 20:47:19 V09 kernel: RSP: 002b:00007ffe916bfb50 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Oct 16 20:47:19 V09 kernel: Code: Bad RIP value.
Oct 16 20:47:19 V09 kernel: RIP: 0033:0x7fcd14622c3e
Oct 16 20:47:19 V09 kernel:  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Oct 16 20:47:19 V09 kernel:  do_syscall_64+0x57/0x190
Oct 16 20:47:19 V09 kernel:  __x64_sys_openat+0x20/0x30
Oct 16 20:47:19 V09 kernel:  ? __do_page_fault+0x2be/0x4d0
Oct 16 20:47:19 V09 kernel:  ? do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  do_sys_open+0x1ba/0x2e0
Oct 16 20:47:19 V09 kernel:  ? __alloc_fd+0x46/0x170
Oct 16 20:47:19 V09 kernel:  ? __check_object_size+0xdb/0x1b0
Oct 16 20:47:19 V09 kernel:  do_filp_open+0x9b/0x110
Oct 16 20:47:19 V09 kernel:  path_openat+0x2f9/0x16a0
Oct 16 20:47:19 V09 kernel:  vfs_open+0x2f/0x40
Oct 16 20:47:19 V09 kernel:  do_dentry_open+0x1df/0x3a0

Any ideas why this is happening ? I see nothing at rclone logs at INFO level. Debug produces GBs of spam in minutes

Last line of the logs before a server restart:

2023/10/16 20:44:33 INFO  : vfs cache: cleaned: objects 21632 (was 21632) in use 839, to upload 0, uploading 0, total size 17.924Ti (was 17.924Ti)
2023/10/16 20:49:39 NOTICE: Serving remote control on http://127.0.0.1:5572/
2023/10/16 20:49:39 INFO  : Encrypted drive 'cf:': poll-interval is not supported by this remote

As you can see it was running just fine, then my php-pool hangs trying to read from the mount ( my guess ), restart the server, and rclone is mounted again, and nothing is logged

The kernel trace says your fpm tasks are blocked doing stuff to fuse for more than 120s.

So some rclone operation has taken more than 2 minutes.

I suggest you reduce the timeout above to less than 2 minutes or increase the kernel timeout in proc like it says in the dmesg.

I'd probably reduce the rclone timeout as stuff waiting for 2 mins is bad.

I'm not sure why the kernel rebooted though - that's probably a setting you can change. Google the first line of the dmesg and see what comes up.

Is it possible to know if this was related to network or disk at all ? Just to know what to check.

The disks should be fast enough to handle the traffic, I would to know where the bottleneck happened. Like 15% of the servers started being affected so that's why I didn't immediately think about just bad hardware.

--timeout= default is 5m. I have set it to 1 min

I also have tried this settings:

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

But didn't seem to help much so I tried increasing the cache

vm.dirty_background_ratio = 5
vm.dirty_ratio = 80

This is what I had before... what do you recommend for a dedicated rclone server where most files are in cache already? And zfs will be using 80-95% of the ram by design so I guess i want vm.dirty.ratio=98 to avoid making the kernel flush everything to disk?

What do you recommend?

And you are correct the kernel was rebooting because of a setting. I disabled it so servers won't reboot anymore, but I believe the root of the issue is still here. Would like to discover the bottleneck to fix this once and for all :frowning:

You'd have to find out what rclone was doing at the time of the crash. I would guess network, but it could be something that caused locking in rclone.

The dirty parameters will be only used for writing cache the VFS cache to disk I think. I would have thought getting them on disk sooner rather than later would be the best thing so the kernel can decided what to cache.

Is the machine under memory pressure? Or is it disk IO that is the problem?

There should be no pressure anywhere... it evens hungs out at times where there isn't a lot of usage:

[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1292436 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1292436   3996 0x80004006
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_flush+0x186/0x1b0
[Wed Oct 18 10:14:58 2023]  filp_close+0x2f/0x80
[Wed Oct 18 10:14:58 2023]  put_files_struct+0x78/0xd0
[Wed Oct 18 10:14:58 2023]  exit_files+0x49/0x50
[Wed Oct 18 10:14:58 2023]  do_exit+0x36b/0xb90
[Wed Oct 18 10:14:58 2023]  do_group_exit+0x43/0xa0
[Wed Oct 18 10:14:58 2023]  get_signal+0x14f/0x860
[Wed Oct 18 10:14:58 2023]  ? new_sync_read+0x122/0x1b0
[Wed Oct 18 10:14:58 2023]  do_signal+0x37/0x700
[Wed Oct 18 10:14:58 2023]  ? __vfs_read+0x29/0x40
[Wed Oct 18 10:14:58 2023]  ? vfs_read+0x8e/0x130
[Wed Oct 18 10:14:58 2023]  exit_to_usermode_loop+0x90/0x130
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x170/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f297031
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc9831c648 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Wed Oct 18 10:14:58 2023] RAX: fffffffffffffffc RBX: 00007f9d172810e0 RCX: 00007f9d1f297031
[Wed Oct 18 10:14:58 2023] RDX: 0000000000002000 RSI: 00007f9d17289000 RDI: 0000000000000005
[Wed Oct 18 10:14:58 2023] RBP: 00007f9d17289000 R08: 00007f9d17289000 R09: 0000000000000089
[Wed Oct 18 10:14:58 2023] R10: 0000000000089000 R11: 0000000000000246 R12: 00007f9d17284180
[Wed Oct 18 10:14:58 2023] R13: 0000000000002000 R14: 0000000000000000 R15: 00007f9d172810e0
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299912 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299912   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97d R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000026
[Wed Oct 18 10:14:58 2023] R13: 0000000000000026 R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299930 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299930   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97d R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000002a
[Wed Oct 18 10:14:58 2023] R13: 000000000000002a R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299931 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299931   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97d R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000003d
[Wed Oct 18 10:14:58 2023] R13: 000000000000003d R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299948 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299948   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97d R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000026
[Wed Oct 18 10:14:58 2023] R13: 0000000000000026 R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299951 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299951   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97d R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000002a
[Wed Oct 18 10:14:58 2023] R13: 000000000000002a R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299969 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299969   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97e R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000044
[Wed Oct 18 10:14:58 2023] R13: 0000000000000044 R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299973 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299973   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97e R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000034
[Wed Oct 18 10:14:58 2023] R13: 0000000000000034 R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1299974 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1299974   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97e R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000038
[Wed Oct 18 10:14:58 2023] R13: 0000000000000038 R14: 0000000000000000 R15: 0000000000000001
[Wed Oct 18 10:14:58 2023] INFO: task php-fpm:1300017 blocked for more than 120 seconds.
[Wed Oct 18 10:14:58 2023]       Tainted: P           OE     5.4.0-149-generic #166~18.04.1-Ubuntu
[Wed Oct 18 10:14:58 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 18 10:14:58 2023] php-fpm         D    0 1300017   3996 0x00004004
[Wed Oct 18 10:14:58 2023] Call Trace:
[Wed Oct 18 10:14:58 2023]  __schedule+0x293/0x740
[Wed Oct 18 10:14:58 2023]  schedule+0x33/0xa0
[Wed Oct 18 10:14:58 2023]  request_wait_answer+0x12e/0x210
[Wed Oct 18 10:14:58 2023]  ? __wake_up_pollfree+0x40/0x40
[Wed Oct 18 10:14:58 2023]  fuse_simple_request+0x198/0x2c0
[Wed Oct 18 10:14:58 2023]  fuse_do_getattr+0xdc/0x320
[Wed Oct 18 10:14:58 2023]  fuse_getattr+0xcf/0xf0
[Wed Oct 18 10:14:58 2023]  vfs_getattr_nosec+0x98/0xb0
[Wed Oct 18 10:14:58 2023]  vfs_getattr+0x36/0x40
[Wed Oct 18 10:14:58 2023]  vfs_statx+0x8d/0xe0
[Wed Oct 18 10:14:58 2023]  __do_sys_newlstat+0x3d/0x70
[Wed Oct 18 10:14:58 2023]  __x64_sys_newlstat+0x16/0x20
[Wed Oct 18 10:14:58 2023]  do_syscall_64+0x57/0x190
[Wed Oct 18 10:14:58 2023]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[Wed Oct 18 10:14:58 2023] RIP: 0033:0x7f9d1f2967c5
[Wed Oct 18 10:14:58 2023] Code: Bad RIP value.
[Wed Oct 18 10:14:58 2023] RSP: 002b:00007ffc98319208 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[Wed Oct 18 10:14:58 2023] RAX: ffffffffffffffda RBX: 00007ffc98319510 RCX: 00007f9d1f2967c5
[Wed Oct 18 10:14:58 2023] RDX: 00007ffc98319270 RSI: 00007ffc98319270 RDI: 00007ffc98319510
[Wed Oct 18 10:14:58 2023] RBP: 00007ffc98319340 R08: 00000000652fd97e R09: 0000000000000001
[Wed Oct 18 10:14:58 2023] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000044
[Wed Oct 18 10:14:58 2023] R13: 0000000000000044 R14: 0000000000000000 R15: 0000000000000001

Right now, at peak hours

There is almost no disk writing only reads:

Total DISK READ : 1779.91 M/s | Total DISK WRITE : 40.87 K/s
Actual DISK READ: 794.41 M/s | Actual DISK WRITE: 0.00 B/s

free -m

              total        used        free      shared  buff/cache   available
Mem:         128809      114287         729         214       13792       13265
Swap:             0           0           0
top - 22:49:50 up 17:32,  7 users,  load average: 18.65, 16.10, 15.94
Tasks: 2347 total,  11 running, 2151 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.9 us, 14.2 sy,  0.0 ni, 57.3 id,  2.3 wa,  0.0 hi, 10.3 si,  0.0 st
KiB Mem : 13190057+total,   567680 free, 11685841+used, 14474480 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 13755928 avail Mem

nvme smart-logs:

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 46 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,266,124,590
data_units_written                  : 11,287,846
host_read_commands                  : 9,061,548,215
host_write_commands                 : 146,343,905
controller_busy_time                : 131,823
power_cycles                        : 2
power_on_hours                      : 2,305
unsafe_shutdowns                    : 1
media_errors                        : 0
num_err_log_entries                 : 19
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 2                : 54 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0
Smart Log for NVME device:nvme1n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 40 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,205,243,903
data_units_written                  : 11,502,018
host_read_commands                  : 8,830,011,888
host_write_commands                 : 158,616,582
controller_busy_time                : 129,619
power_cycles                        : 2
power_on_hours                      : 2,316
unsafe_shutdowns                    : 2
media_errors                        : 0
num_err_log_entries                 : 19
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 2                : 49 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0
Smart Log for NVME device:nvme2n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 42 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,207,518,420
data_units_written                  : 11,437,183
host_read_commands                  : 8,837,970,459
host_write_commands                 : 153,929,319
controller_busy_time                : 129,720
power_cycles                        : 2
power_on_hours                      : 2,306
unsafe_shutdowns                    : 1
media_errors                        : 0
num_err_log_entries                 : 19
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 2                : 49 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0
Smart Log for NVME device:nvme4n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 44 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,205,131,738
data_units_written                  : 11,429,551
host_read_commands                  : 8,829,132,131
host_write_commands                 : 154,366,255
controller_busy_time                : 129,014
power_cycles                        : 2
power_on_hours                      : 2,299
unsafe_shutdowns                    : 1
media_errors                        : 0
num_err_log_entries                 : 19
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 2                : 51 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0
Smart Log for NVME device:nvme3n1 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 45 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,203,325,243
data_units_written                  : 11,430,497
host_read_commands                  : 8,820,836,024
host_write_commands                 : 154,799,583
controller_busy_time                : 128,959
power_cycles                        : 2
power_on_hours                      : 2,296
unsafe_shutdowns                    : 1
media_errors                        : 0
num_err_log_entries                 : 19
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 2                : 54 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

While this logs does not reflect the time at which the tasks hung, now the server is at much higher load than at the time of the logs...

High ram usage is expected because zfs caches reads in ram. I usually give ~120gb. That was my assumption for making dirty ratio 90% so the high zfs ram usage wouldn't make the kernel make i/o synchronous (correct me if wrong)

I see nothing but a server that at best is operating with too small free space, because of that rclone issue counting stuff in the cache... but there is no i/o wait in top, so I don't know.

This is just one server. I have 5 servers since last reboot with same hung tasks. And know what is even more weird? They appear to be all around 10:14 10:15.

But my backend is cloudflare r2, and I barely download anything now with the cache.... so this is really a mistery above beyond what I can do :confused:

That probably means the hang in rclone is likely caused by a networking problem if it affected all your severs at the same time. Make sure these two flags have relatively low timeouts. You are using R2 which shoud be high performance so try 30s for each say. This should mean rclone detects a broken network connection earlier, breaks it and retries all within the 120s that the kernel will wait for.

  --contimeout Duration   Connect timeout (default 1m0s)
  --timeout Duration                   IO idle timeout (default 5m0s)

Just rebooted all servers to apply the new settings! There was no more hung tasks in dmesg logs.

But I did have complaints about poor download speeds but as of now I don't know if it's my DC fault or possibly the same thing that started the hung tasks again.

So I'll watch out thanks!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.