Rclone keeps restarting

What is the problem you are having with rclone?

I am running rclone on my home server (running debian11/ OpenMediaVault 6) to serve my cloud storage via sftp. It works fine (except for the issue described here). But this morning, my home server suddenly became unreachable. After restarting, I found this /var/log/messages:

Feb 25 11:30:21 server kernel: [51615.748213] ------------[ cut here ]------------
Feb 25 11:30:21 server kernel: [51615.748216] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
Feb 25 11:30:21 server kernel: [51615.748230] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280
Feb 25 11:30:21 server kernel: [51615.748236] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel tcp_diag udp_diag inet_diag xt_nat veth 8021q garp mrp xt_conntrack nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype vhost_net vhost vhost_iotlb tap xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_tcpudp nft_compat nf_tables nfnetlink overlay ch341 intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul ghash_clmulni_intel ppdev mei_hdcp snd_hda_intel snd_intel_dspcfg aesni_intel snd_intel_sdw_acpi snd_hda_codec crypto_simd cryptd snd_hda_core i915 snd_hwdep rapl snd_pcm intel_cstate ttm drm_kms_helper serio_raw snd_timer pcspkr snd soundcore ee1004 cec rc_core i2c_algo_bit fb_sys_fops syscopyarea cp210x
Feb 25 11:30:21 server kernel: [51615.748292]  sysfillrect usbserial sysimgblt parport_pc parport mei_me mei intel_pch_thermal mac_hid acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) nfsd spl(O) auth_rpcgss macvlan nfs_acl lockd grace drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear xhci_pci xhci_pci_renesas crc32_pclmul i2c_i801 ahci e1000e i2c_smbus libahci xhci_hcd psmouse wmi video
Feb 25 11:30:21 server kernel: [51615.748335] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O      5.15.85-1-pve #1
Feb 25 11:30:21 server kernel: [51615.748338] Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.14.0 for D3417-B2x                    10/24/2017
Feb 25 11:30:21 server kernel: [51615.748339] RIP: 0010:dev_watchdog+0x277/0x280
Feb 25 11:30:21 server kernel: [51615.748343] Code: eb 97 48 8b 5d d0 c6 05 26 82 4d 01 01 48 89 df e8 5e 53 f9 ff 44 89 e1 48 89 de 48 c7 c7 f0 a9 8a b7 48 89 c2 e8 cd 99 1c 00 <0f> 0b eb 80 e9 45 ca 25 00 0f 1f 44 00 00 55 49 89 ca 48 89 e5 41
Feb 25 11:30:21 server kernel: [51615.748345] RSP: 0018:ffffc28740114e70 EFLAGS: 00010282
Feb 25 11:30:21 server kernel: [51615.748347] RAX: 0000000000000000 RBX: ffffa0f20d4e0000 RCX: ffffa0f746520588
Feb 25 11:30:21 server kernel: [51615.748349] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa0f746520580
Feb 25 11:30:21 server kernel: [51615.748350] RBP: ffffc28740114ea8 R08: 0000000000000003 R09: 0000000000000001
Feb 25 11:30:21 server kernel: [51615.748352] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
Feb 25 11:30:21 server kernel: [51615.748353] R13: ffffa0f20174b680 R14: 0000000000000001 R15: ffffa0f20d4e04c0
Feb 25 11:30:21 server kernel: [51615.748355] FS:  0000000000000000(0000) GS:ffffa0f746500000(0000) knlGS:0000000000000000
Feb 25 11:30:21 server kernel: [51615.748357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 11:30:21 server kernel: [51615.748358] CR2: 00007f575572f024 CR3: 00000004ed410004 CR4: 00000000003726e0
Feb 25 11:30:21 server kernel: [51615.748360] Call Trace:
Feb 25 11:30:21 server kernel: [51615.748361]  <IRQ>
Feb 25 11:30:21 server kernel: [51615.748364]  ? pfifo_fast_enqueue+0x160/0x160
Feb 25 11:30:21 server kernel: [51615.748367]  call_timer_fn+0x29/0x120
Feb 25 11:30:21 server kernel: [51615.748372]  __run_timers.part.0+0x1e1/0x270
Feb 25 11:30:21 server kernel: [51615.748374]  ? ktime_get+0x43/0xc0
Feb 25 11:30:21 server kernel: [51615.748376]  ? lapic_next_deadline+0x2c/0x40
Feb 25 11:30:21 server kernel: [51615.748379]  ? clockevents_program_event+0xa8/0x130
Feb 25 11:30:21 server kernel: [51615.748382]  run_timer_softirq+0x2a/0x60
Feb 25 11:30:21 server kernel: [51615.748384]  __do_softirq+0xd6/0x2ea
Feb 25 11:30:21 server kernel: [51615.748388]  irq_exit_rcu+0x94/0xc0
Feb 25 11:30:21 server kernel: [51615.748390]  sysvec_apic_timer_interrupt+0x80/0x90
Feb 25 11:30:21 server kernel: [51615.748393]  </IRQ>
Feb 25 11:30:21 server kernel: [51615.748394]  <TASK>
Feb 25 11:30:21 server kernel: [51615.748395]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Feb 25 11:30:21 server kernel: [51615.748397] RIP: 0010:cpuidle_enter_state+0xd9/0x620
Feb 25 11:30:21 server kernel: [51615.748401] Code: 3d 44 3f 3f 49 e8 77 ff 6d ff 49 89 c7 0f 1f 44 00 00 31 ff e8 b8 0c 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Feb 25 11:30:21 server kernel: [51615.748403] RSP: 0018:ffffc287400d7e38 EFLAGS: 00000246
Feb 25 11:30:21 server kernel: [51615.748404] RAX: ffffa0f746530bc0 RBX: ffffe2873fd00000 RCX: 0000000000000000
Feb 25 11:30:21 server kernel: [51615.748406] RDX: 0000000000001782 RSI: 000000002c13dcf6 RDI: 0000000000000000
Feb 25 11:30:21 server kernel: [51615.748407] RBP: ffffc287400d7e88 R08: 00002ef1ba561902 R09: 00000000000c3500
Feb 25 11:30:21 server kernel: [51615.748408] R10: 0000000000000007 R11: 071c71c71c71c71c R12: ffffffffb80d4420
Feb 25 11:30:21 server kernel: [51615.748410] R13: 0000000000000006 R14: 0000000000000006 R15: 00002ef1ba561902
Feb 25 11:30:21 server kernel: [51615.748413]  ? cpuidle_enter_state+0xc8/0x620
Feb 25 11:30:21 server kernel: [51615.748417]  cpuidle_enter+0x2e/0x50
Feb 25 11:30:21 server kernel: [51615.748420]  do_idle+0x20d/0x2b0
Feb 25 11:30:21 server kernel: [51615.748423]  cpu_startup_entry+0x20/0x30
Feb 25 11:30:21 server kernel: [51615.748425]  start_secondary+0x12a/0x180
Feb 25 11:30:21 server kernel: [51615.748429]  secondary_startup_64_no_verify+0xc2/0xcb
Feb 25 11:30:21 server kernel: [51615.748433]  </TASK>
Feb 25 11:30:21 server kernel: [51615.748434] ---[ end trace 92789cc296e4fa0f ]---

I have very limited knowledge of system administration, but I understand that the network connection suddenly failed. To find hints what might have cause this, I looked further up in the logs and found this

```
Feb 25 11:19:01 server monit[1249]: 'server' mem usage of 94.2% matches resource limit [mem usage > 90.0%]
Feb 25 11:19:01 server monit[1249]: 'server' loadavg (5min) of 3.0 matches resource limit [loadavg (5min) > 2.0]
Feb 25 11:19:03 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1294.
Feb 25 11:19:03 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:03 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:04 server systemd[1]: run-docker-runtime\x2drunc-moby-ce318676bc118a66cebbb83289e5e479b80735aa1db8acf8654e0136c407671c-runc.Mhl26s.mount: Succeeded.
Feb 25 11:19:04 server systemd[800618]: run-docker-runtime\x2drunc-moby-ce318676bc118a66cebbb83289e5e479b80735aa1db8acf8654e0136c407671c-runc.Mhl26s.mount: Succeeded.
Feb 25 11:19:04 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:04 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:04 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
Feb 25 11:19:04 server systemd[1]: run-docker-runtime\x2drunc-moby-2a28cd8fbbd81602f6975500642fde4bbfd64480cfece63ad953fab7a67a78e6-runc.sRMieZ.mount: Succeeded.
Feb 25 11:19:04 server systemd[800618]: run-docker-runtime\x2drunc-moby-2a28cd8fbbd81602f6975500642fde4bbfd64480cfece63ad953fab7a67a78e6-runc.sRMieZ.mount: Succeeded.
Feb 25 11:19:09 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1295.
Feb 25 11:19:09 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:09 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:10 server smbd[7502]: [2023/02/25 11:19:10.026875,  2] ../../source3/smbd/dosmode.c:137(unix_mode)
Feb 25 11:19:10 server smbd[7502]:   unix_mode(.) inherit mode 42770
Feb 25 11:19:10 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:10 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:10 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
Feb 25 11:19:13 server systemd[1]: run-docker-runtime\x2drunc-moby-9a02009fff7f21c23e32120b7c0120c63906005c6c1162a6f90aa5013f41a687-runc.XZVZxK.mount: Succeeded.
Feb 25 11:19:13 server systemd[800618]: run-docker-runtime\x2drunc-moby-9a02009fff7f21c23e32120b7c0120c63906005c6c1162a6f90aa5013f41a687-runc.XZVZxK.mount: Succeeded.
Feb 25 11:19:15 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1296.
Feb 25 11:19:15 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:15 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:16 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:16 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:16 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
Feb 25 11:19:19 server systemd[1]: run-docker-runtime\x2drunc-moby-2a28cd8fbbd81602f6975500642fde4bbfd64480cfece63ad953fab7a67a78e6-runc.m8nvVQ.mount: Succeeded.
Feb 25 11:19:19 server systemd[800618]: run-docker-runtime\x2drunc-moby-2a28cd8fbbd81602f6975500642fde4bbfd64480cfece63ad953fab7a67a78e6-runc.m8nvVQ.mount: Succeeded.
Feb 25 11:19:20 server smbd[7502]: [2023/02/25 11:19:20.019552,  2] ../../source3/smbd/dosmode.c:137(unix_mode)
Feb 25 11:19:20 server smbd[7502]:   unix_mode(.) inherit mode 42770
Feb 25 11:19:21 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1297.
Feb 25 11:19:21 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:21 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:21 server systemd[800618]: run-docker-runtime\x2drunc-moby-cc0a392d21a988a0f489b288b064e09f1c431f1800e47383db6a80b1f1a9ff31-runc.8PDaji.mount: Succeeded.
Feb 25 11:19:21 server systemd[1]: run-docker-runtime\x2drunc-moby-cc0a392d21a988a0f489b288b064e09f1c431f1800e47383db6a80b1f1a9ff31-runc.8PDaji.mount: Succeeded.
Feb 25 11:19:22 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:22 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:22 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
Feb 25 11:19:27 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1298.
Feb 25 11:19:27 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:27 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:28 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:28 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:28 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
Feb 25 11:19:30 server smbd[7502]: [2023/02/25 11:19:30.020170,  2] ../../source3/smbd/dosmode.c:137(unix_mode)
Feb 25 11:19:30 server smbd[7502]:   unix_mode(.) inherit mode 42770
Feb 25 11:19:31 server monit[1249]: 'server' mem usage of 94.5% matches resource limit [mem usage > 90.0%]
Feb 25 11:19:31 server monit[1249]: 'server' loadavg (5min) of 2.9 matches resource limit [loadavg (5min) > 2.0]
Feb 25 11:19:33 server systemd[800618]: rclone@pcloud.service: Scheduled restart job, restart counter is at 1299.
Feb 25 11:19:33 server systemd[800618]: Stopped rclone: make sure pcloud is served via sftp.
Feb 25 11:19:33 server systemd[800618]: Starting rclone: make sure pcloud is served via sftp...
Feb 25 11:19:33 server systemd[800618]: rclone@pcloud.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 11:19:33 server systemd[800618]: rclone@pcloud.service: Failed with result 'exit-code'.
Feb 25 11:19:33 server systemd[800618]: Failed to start rclone: make sure pcloud is served via sftp.
...
[similar messages continue until the network fails]

While don't know why rclone keeps restarting

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1

  • os/version: debian 11.6 (64 bit)
  • os/kernel: 5.15.85-1-pve (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.19.4
  • go/linking: static
  • go/tags: none
    root@server:/var/log#

Which cloud storage system are you using? (eg Google Drive)

pCloud

The command you were trying to run (eg rclone copy /tmp remote:tmp)

export GOGC=50 && rclone serve sftp pcloud:Backup/ --addr :2022 --user *********  --pass ********* --log-file=/zfs/NAS/config/rclone/rclone.log --vfs-cache-mode writes -vv &

The rclone config contents with secrets removed.

[pcloud]
type = pcloud
token = {"access_token":"*********","token_type":"bearer","expiry":"0001-01-01T00:00:00Z"}

A log from the command with the -vv flag

Here are the logs from around the same time as the kernel logs

2023/02/25 11:18:53 DEBUG : : Re-reading directory (5m1.407479799s old)
2023/02/25 11:18:53 DEBUG : Duplicacy: Re-reading directory (5m1.40898938s old)
2023/02/25 11:18:54 DEBUG : Duplicacy/chunks: Re-reading directory (5m1.408330709s old)
2023/02/25 11:18:57 INFO  : Duplicacy/chunks/6b/06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97: Moved (server-side) to: Duplicacy/chunks/6b/06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97.fsl
2023/02/25 11:18:57 DEBUG : Duplicacy/chunks/6b/06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97.fsl: Updating file with Duplicacy/chunks/6b/06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97.fsl 0xc056e9efc0
2023/02/25 11:18:57 DEBUG : Duplicacy/chunks/6b: Added virtual directory entry vDel: "06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97"
2023/02/25 11:18:57 DEBUG : Duplicacy/chunks/6b: Added virtual directory entry vAddFile: "06e4244e07d89d7446e098e5d4d2bea3d3a1af7dc48832f31578a455ab8f97.fsl"
2023/02/25 11:19:00 INFO  : Duplicacy/chunks/68/76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77: Moved (server-side) to: Duplicacy/chunks/68/76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77.fsl
2023/02/25 11:19:00 DEBUG : Duplicacy/chunks/68/76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77.fsl: Updating file with Duplicacy/chunks/68/76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77.fsl 0xc00dda3440
2023/02/25 11:19:00 DEBUG : Duplicacy/chunks/68: Added virtual directory entry vDel: "76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77"
2023/02/25 11:19:00 DEBUG : Duplicacy/chunks/68: Added virtual directory entry vAddFile: "76c2964d81e2dc0d3f25bd3653674954360bfaed6e847cf1a67fa362b90b77.fsl"
2023/02/25 11:19:04 INFO  : Duplicacy/chunks/66/64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718: Moved (server-side) to: Duplicacy/chunks/66/64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718.fsl
2023/02/25 11:19:04 DEBUG : Duplicacy/chunks/66/64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718.fsl: Updating file with Duplicacy/chunks/66/64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718.fsl 0xc0470fb440
2023/02/25 11:19:04 DEBUG : Duplicacy/chunks/66: Added virtual directory entry vDel: "64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718"
2023/02/25 11:19:04 DEBUG : Duplicacy/chunks/66: Added virtual directory entry vAddFile: "64568416541c4139aa91b7c7f26caee82804482044ff49f8626327d302a718.fsl"
2023/02/25 11:19:04 DEBUG : Duplicacy/chunks/57: Re-reading directory (42m20.595207857s old)
2023/02/25 11:19:07 DEBUG : Duplicacy/chunks/57: Removed virtual directory entry vAddFile: "fbf03a439c5ab14985860dc8cc81b4ec592ae0b14b56efc94ae4e9e48fd7cc.fsl"
2023/02/25 11:19:07 DEBUG : Duplicacy/chunks/57: Removed virtual directory entry vDel: "fbf03a439c5ab14985860dc8cc81b4ec592ae0b14b56efc94ae4e9e48fd7cc"
2023/02/25 11:19:07 DEBUG : Duplicacy/chunks/81: Re-reading directory (1h52m45.594520323s old)
2023/02/25 11:19:11 DEBUG : Duplicacy/chunks/81: Removed virtual directory entry vDel: "f6b244429e7cd83fd435a893593e9a903835b7b76600a1357a3f64a58a1977"
2023/02/25 11:19:11 DEBUG : Duplicacy/chunks/81: Removed virtual directory entry vAddFile: "f6b244429e7cd83fd435a893593e9a903835b7b76600a1357a3f64a58a1977.fsl"
2023/02/25 11:19:15 INFO  : Duplicacy/chunks/f5/1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650: Moved (server-side) to: Duplicacy/chunks/f5/1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650.fsl
2023/02/25 11:19:15 DEBUG : Duplicacy/chunks/f5/1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650.fsl: Updating file with Duplicacy/chunks/f5/1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650.fsl 0xc0a7f06d80
2023/02/25 11:19:15 DEBUG : Duplicacy/chunks/f5: Added virtual directory entry vDel: "1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650"
2023/02/25 11:19:15 DEBUG : Duplicacy/chunks/f5: Added virtual directory entry vAddFile: "1e75cd5d9ea0c8a761879da84f656190bd6b98811120cced12da88dd6c1650.fsl"
2023/02/25 11:19:19 INFO  : Duplicacy/chunks/d2/bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83: Moved (server-side) to: Duplicacy/chunks/d2/bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83.fsl
2023/02/25 11:19:19 DEBUG : Duplicacy/chunks/d2/bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83.fsl: Updating file with Duplicacy/chunks/d2/bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83.fsl 0xc02cfdf680
2023/02/25 11:19:19 DEBUG : Duplicacy/chunks/d2: Added virtual directory entry vDel: "bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83"
2023/02/25 11:19:19 DEBUG : Duplicacy/chunks/d2: Added virtual directory entry vAddFile: "bfae6b96cb9e9f4aad90bb149ad99b8beb013f9a805f356ac44b95d36d0a83.fsl"
2023/02/25 11:19:22 INFO  : Duplicacy/chunks/81/7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4: Moved (server-side) to: Duplicacy/chunks/81/7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4.fsl
2023/02/25 11:19:22 DEBUG : Duplicacy/chunks/81/7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4.fsl: Updating file with Duplicacy/chunks/81/7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4.fsl 0xc063a68cc0
2023/02/25 11:19:22 DEBUG : Duplicacy/chunks/81: Added virtual directory entry vDel: "7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4"
2023/02/25 11:19:22 DEBUG : Duplicacy/chunks/81: Added virtual directory entry vAddFile: "7919e7df45d071bdc5eb4f9c66571223713b2ae6ae287824b901d36eadc8c4.fsl"
2023/02/25 11:19:22 DEBUG : Duplicacy/chunks/ee: Re-reading directory (17m55.524942486s old)
2023/02/25 11:19:25 DEBUG : Duplicacy/chunks/ee: Removed virtual directory entry vDel: "ed41e65d2e2e46122e360bfa78d69006792ecbcad1472c0cde6c1d32bba0bb.sptfwzju.tmp"
2023/02/25 11:19:25 DEBUG : Duplicacy/chunks/ee: Removed virtual directory entry vAddFile: "ed41e65d2e2e46122e360bfa78d69006792ecbcad1472c0cde6c1d32bba0bb"
2023/02/25 11:19:29 INFO  : Duplicacy/chunks/57/70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f: Moved (server-side) to: Duplicacy/chunks/57/70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f.fsl
2023/02/25 11:19:29 DEBUG : Duplicacy/chunks/57/70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f.fsl: Updating file with Duplicacy/chunks/57/70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f.fsl 0xc01b148300
2023/02/25 11:19:29 DEBUG : Duplicacy/chunks/57: Added virtual directory entry vDel: "70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f"
2023/02/25 11:19:29 DEBUG : Duplicacy/chunks/57: Added virtual directory entry vAddFile: "70b28e59df3ece390abb22c5157b94f00b469e1667ed7ce9c924e009559d7f.fsl"
2023/02/25 11:19:29 DEBUG : Duplicacy/chunks/6c: Re-reading directory (52m11.77067156s old)
2023/02/25 11:19:31 INFO  : 
Transferred:   	    7.413 GiB / 7.413 GiB, 100%, 3.141 KiB/s, ETA 0s
Checks:              2277 / 2277, 100%
Renamed:             2277
Transferred:         9719 / 9719, 100%
Elapsed time:    14h9m6.7s

2023/02/25 11:19:32 DEBUG : Duplicacy/chunks/6c: Removed virtual directory entry vDel: "7e609a03cc78078a706ab19ce588163cae75e339746b0ce90d34fdd1e809f4"
2023/02/25 11:19:32 DEBUG : Duplicacy/chunks/6c: Removed virtual directory entry vAddFile: "7e609a03cc78078a706ab19ce588163cae75e339746b0ce90d34fdd1e809f4.fsl"
2023/02/25 11:19:32 DEBUG : Duplicacy/chunks/6c: Removed virtual directory entry vDel: "bca22ff5a308084a8860f2d1ba8ec8fbc8aed0e1ff6afbdf833930dc3fcc8d"
2023/02/25 11:19:32 DEBUG : Duplicacy/chunks/6c: Removed virtual directory entry vAddFile: "bca22ff5a308084a8860f2d1ba8ec8fbc8aed0e1ff6afbdf833930dc3fcc8d.fsl"
2023/02/25 11:19:32 DEBUG : Duplicacy/chunks/31: Re-reading directory (21m11.973404829s old)
2023/02/25 11:19:36 DEBUG : Duplicacy/chunks/31: Removed virtual directory entry vAddFile: "b55065e58631fc86b3164cc8b934afd18820dd926a8f17e9efc09221f34164.fsl"
2023/02/25 11:19:36 DEBUG : Duplicacy/chunks/31: Removed virtual directory entry vDel: "4aef0fcdb3bcbebba696f0b5bbb7fe2510fcab8ef396b0426cbb0ea832157b"
2023/02/25 11:19:36 DEBUG : Duplicacy/chunks/31: Removed virtual directory entry vAddFile: "4aef0fcdb3bcbebba696f0b5bbb7fe2510fcab8ef396b0426cbb0ea832157b.fsl"
2023/02/25 11:19:36 DEBUG : Duplicacy/chunks/31: Removed virtual directory entry vDel: "b55065e58631fc86b3164cc8b934afd18820dd926a8f17e9efc09221f34164"
2023/02/25 11:19:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/1a/638aabc495aff8116a9ae7647b6ba1c13a69328d6319de724b427e3dc6218c not removed, freed 0 bytes
2023/02/25 11:19:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/e7/6912ec6b4abd49e9997f093ed750d9807180546e1a6ba19798655e9a898b19 not removed, freed 0 bytes
2023/02/25 11:19:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/a2/c5068e9f8b4be4da485783480778e479a5221611a03a5efb38645f1fb1f34b not removed, freed 0 bytes
2023/02/25 11:19:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/f2/48590260dd1882cf2f6ecb295c483ead46a6cebf58f2ed5ec68d3b5edeea14 not removed, freed 0 bytes
2023/02/25 11:19:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/9f/29eb0a6d965699cdcb9bffe4c683dfb1811e17f4441664c65a34a6cd07cd76 not removed, freed 0 bytes

and the other one.

2023/02/25 11:30:06 DEBUG : Duplicacy/chunks/a2/a322299b702037b1e8409bc156d482a73634c3ff6cab2e723741ab1561dc0e.fsl: Updating file with Duplicacy/chunks/a2/a322299b702037b1e8409bc156d482a73634c3ff6cab2e723741ab1561dc0e.fsl 0xc048d5d800
2023/02/25 11:30:06 DEBUG : Duplicacy/chunks/a2: Added virtual directory entry vDel: "a322299b702037b1e8409bc156d482a73634c3ff6cab2e723741ab1561dc0e"
2023/02/25 11:30:06 DEBUG : Duplicacy/chunks/a2: Added virtual directory entry vAddFile: "a322299b702037b1e8409bc156d482a73634c3ff6cab2e723741ab1561dc0e.fsl"
2023/02/25 11:30:10 INFO  : Duplicacy/chunks/6f/076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6: Moved (server-side) to: Duplicacy/chunks/6f/076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6.fsl
2023/02/25 11:30:10 DEBUG : Duplicacy/chunks/6f/076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6.fsl: Updating file with Duplicacy/chunks/6f/076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6.fsl 0xc0048209c0
2023/02/25 11:30:10 DEBUG : Duplicacy/chunks/6f: Added virtual directory entry vDel: "076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6"
2023/02/25 11:30:10 DEBUG : Duplicacy/chunks/6f: Added virtual directory entry vAddFile: "076f521c5438716964546bcb4f8368e0bf493f3a1d053c41f22285feacd1a6.fsl"
2023/02/25 11:30:10 DEBUG : Duplicacy/chunks/86: Re-reading directory (25m0.503117966s old)
2023/02/25 11:30:13 DEBUG : Duplicacy/chunks/86: Removed virtual directory entry vDel: "9a4d259ce76ee8ba01ba400b8c967e1ec9606bc333fe50aa881c6d11fffd14"
2023/02/25 11:30:13 DEBUG : Duplicacy/chunks/86: Removed virtual directory entry vAddFile: "9a4d259ce76ee8ba01ba400b8c967e1ec9606bc333fe50aa881c6d11fffd14.fsl"
2023/02/25 11:30:31 INFO  : 
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/51/bba729d0360f0f97c8151010bab59def4ae375eff85c012aef6bc1ef46c2c3 not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/28/0a4688b7d0dddfc9a2f5fcd6d3ef3cb94808e1e7c49de506244e4c21c645c1 not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/e7/e3c4b2b4d25c7b0570d5da9b25b874effef285f3285432238fcabe9f7e1f3c not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/68/532989d7cd511f9b16ceeb3605e0deb0dcab64038f6e8d05840e1c07f97b3a not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/e5/7d888b9996fe85dedd69d4fb34baa92e8ca25ffcc7eae131b7383832754f8b not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/a4/4f13bf40f6e47e543031b23ac1b8daec008d8e1e021c906fbd0dd4f7fdc3ca not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/40/ccfbb7dbbde2fc766760a6430837516733b8d4403b9d4ab11d5dab8fd6e5d9 not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG : vfs cache RemoveNotInUse (maxAge=3600000000000, emptyOnly=false): item Duplicacy/chunks/22/cd4e6d893a6ea16d074de84fd4c42570751cd7a31863a2d5393b4ec6412d7e not removed, freed 0 bytes
2023/02/25 11:30:37 DEBUG 

I can't really see any connection between the logs, so I'm rather puzzled as to what is going on...

This looks like monit detected rclone was using too much memory and systemd restarted it.

This is likely a consequence of your server getting low on memory. If you server gets very low on memory then networking stuff can stop working (I've seen this in the past). This is really a kernel bug but it low memory corner cases in the kernel are common.

So I think the root cause is rclone using too much memory.

I don't think we got to the bottom of that in the other thread.

It would be useful if you could make a memory debug trace when rclone is using lots of memory as outlined in the rc docs -you would need to run rclone with --rc, install go (latest version recommended) and run the commands linked.

Then we can know for certain what is causing the excessive memory usage.

So the first step is just to add --rc to my command (it would the be export GOGC=50 && rclone serve sftp pcloud:Backup/ --addr :2022 --user ******* --pass ******* --log-file=/zfs/NAS/config/rclone/rclone.log --vfs-cache-mode writes -vv --rc &, wait until stuff happens and only then do I need to install go or does Go need to be installed from the start?

Can I turn off -vv, though? The log files are becoming huge (>2GB)...

Yes, that's right

You can install go later that is fine!

Yes - good idea.

Okay, I figured out why pcloud keeps restarting: I was running it as a user service :laughing:

I had checked many times that it was not running as a service but I had forgotten about user services and apparently they don't show up when doing the normal checks for services. I guess because those are system services. Duh.

So below is my rclone@pcloud.service file. I wonder whether there is anything wrong with it that might explain why pcloud failed (as described in the other topic). That is: we have an explanation for the restarting, but I guess the question of what causes the memory problem remains, right?

# User service for Rclone mounting
 #
 # Place in ~/.config/systemd/user/
 # File must include the '@' (ex rclone@.service)
 # As your normal user, run
 #   systemctl --user daemon-reload
 # You can now start/enable each remote by using rclone@<remote>
 #   systemctl --user enable rclone@dropbox
 #   systemctl --user start rclone@dropbox

 [Unit]
 Description=rclone: make sure pcloud is served via sftp
 Documentation=man:rclone(1)
 After=network-online.target
 Wants=network-online.target
# AssertPathIsDirectory=%h/mnt/%i

 [Service]
 Restart=on-failure
 RestartSec=5s
 Type=notify
 ExecStart= \
   /usr/bin/rclone serve sftp pcloud:Backup/ \
     --config=%h/.config/rclone/rclone.conf \
     --addr :2022 \
     --vfs-cache-mode minimal \
  #   --vfs-cache-max-size 100M \
     --log-level INFO \
     --log-file /zfs/NAS/config/rclone/rclone-%i.log \
     --user christoph
     --pass Font6-Antibody-Widget
  #   --umask 022 \
  #   --allow-other \
     %i: %h/mnt/%i
 #ExecStop=/bin/fusermount -u %h/mnt/%i

[Install]
WantedBy=default.target

How many files get accessed in the mount? Rclone will use memory proportional to the number of files, maybe 1k per file. So if you have 1 Million files, that is 1 GB of ram. If the files never get accessed (by that I mean listed rather than opened) then it doesn't use RAM.

I have now installed Go (and graphviz, for some reason) but I misunderstood what I'm supposed to do. I thought I run those commands after rclone has failed. I now realized that rclone needs to be running to do that. So I just restarted rclone (with the --rc flag) but I'm not sure when is the right time to create the memory debug trace. If I understand things correctly, we idealy want it just before it fails/gets killed by systemd, but I have no idea how to catch that moment.

What I can say, though, is that now that I have disabled the rclone service, it takes just around a day (instead of previously two weeks) for rclone to fail. (it probably failed at s similar speed previously, but I didn't notice because the service immediately restarted it. - I have no idea, why even the service eventually failed to restart it)

In any case, I can probably create the desired memory trace later today, but how do I determine the right moment?

And is it just this command I should execute, or also other ones: go tool pprof -web http://localhost:5572/debug/pprof/heap?

For what it's worth, here is the current state of things about 10 minutes after starting rclone:

go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
Saved profile in /zfs/NAS/config/homedirs/christoph/pprof/pprof.rclone.alloc_objects.alloc_space.inuse_objects.inuse_space.005.pb.gz
File: rclone
Type: inuse_space
Time: Mar 4, 2023 at 11:29am (CET)
Showing nodes accounting for 907.63MB, 98.78% of 918.85MB total
Dropped 68 nodes (cum <= 4.59MB)
      flat  flat%   sum%        cum   cum%
  250.55MB 27.27% 27.27%   250.55MB 27.27%  github.com/rclone/rclone/vfs.newFile
  232.52MB 25.31% 52.57%   232.52MB 25.31%  time.FixedZone (inline)
  191.03MB 20.79% 73.36%   191.03MB 20.79%  github.com/rclone/rclone/backend/pcloud.(*Fs).newObjectWithInfo
  130.01MB 14.15% 87.51%   130.01MB 14.15%  path.Join
   68.87MB  7.50% 95.01%   319.41MB 34.76%  github.com/rclone/rclone/vfs.(*Dir)._readDirFromEntries
   16.50MB  1.80% 96.80%   250.02MB 27.21%  encoding/json.(*decodeState).literalStore
    9.15MB     1% 97.80%     9.15MB     1%  reflect.unsafe_NewArray
       8MB  0.87% 98.67%        8MB  0.87%  encoding/json.(*Decoder).refill
       1MB  0.11% 98.78%   233.52MB 25.41%  github.com/rclone/rclone/backend/pcloud/api.(*Time).UnmarshalJSON
         0     0% 98.78%   267.17MB 29.08%  encoding/json.(*Decoder).Decode
         0     0% 98.78%        8MB  0.87%  encoding/json.(*Decoder).readValue
         0     0% 98.78%   259.17MB 28.21%  encoding/json.(*decodeState).array
         0     0% 98.78%   259.17MB 28.21%  encoding/json.(*decodeState).object
         0     0% 98.78%   259.17MB 28.21%  encoding/json.(*decodeState).unmarshal
         0     0% 98.78%   259.17MB 28.21%  encoding/json.(*decodeState).value
         0     0% 98.78%   673.61MB 73.31%  github.com/pkg/sftp.(*Request).call
         0     0% 98.78%   674.61MB 73.42%  github.com/pkg/sftp.(*RequestServer).Serve.func2.1
         0     0% 98.78%   675.61MB 73.53%  github.com/pkg/sftp.(*RequestServer).packetWorker
         0     0% 98.78%   673.61MB 73.31%  github.com/pkg/sftp.filestat
         0     0% 98.78%   588.71MB 64.07%  github.com/rclone/rclone/backend/pcloud.(*Fs).List
         0     0% 98.78%   588.71MB 64.07%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll.func1
         0     0% 98.78%   321.54MB 34.99%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll.func2
         0     0% 98.78%   588.71MB 64.07%  github.com/rclone/rclone/backend/pcloud.(*Fs).listHelper
         0     0% 98.78%   321.04MB 34.94%  github.com/rclone/rclone/backend/pcloud.(*Fs).listHelper.func1
         0     0% 98.78%   908.13MB 98.83%  github.com/rclone/rclone/cmd/serve/sftp.vfsHandler.Filelist
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/fs.pacerInvoker
         0     0% 98.78%   588.71MB 64.07%  github.com/rclone/rclone/fs/list.DirSorted
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/lib/pacer.(*Pacer).Call
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/lib/pacer.(*Pacer).call
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/lib/rest.(*Client).CallJSON (inline)
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/lib/rest.(*Client).callCodec
         0     0% 98.78%   267.17MB 29.08%  github.com/rclone/rclone/lib/rest.DecodeJSON
         0     0% 98.78%   906.13MB 98.62%  github.com/rclone/rclone/vfs.(*Dir).Stat
         0     0% 98.78%   908.13MB 98.83%  github.com/rclone/rclone/vfs.(*Dir)._readDir
         0     0% 98.78%   906.13MB 98.62%  github.com/rclone/rclone/vfs.(*Dir).stat
         0     0% 98.78%   906.13MB 98.62%  github.com/rclone/rclone/vfs.(*VFS).Stat
         0     0% 98.78%     9.15MB     1%  reflect.MakeSlice
         0     0% 98.78%     8.71MB  0.95%  runtime.doInit
         0     0% 98.78%     9.72MB  1.06%  runtime.main
         0     0% 98.78%   232.52MB 25.31%  time.Parse (inline)
         0     0% 98.78%   232.52MB 25.31%  time.parse

Edit2: This time it went very fast. Less than two hours later rclone was already killed. The latest trace I caught is below, but it was still rather early in the process (the failure occurred around 12:20):

go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
Saved profile in /zfs/NAS/config/homedirs/christoph/pprof/pprof.rclone.alloc_objects.alloc_space.inuse_objects.inuse_space.007.pb.gz
File: rclone
Type: inuse_space
Time: Mar 4, 2023 at 11:44am (CET)
Showing nodes accounting for 2127.47MB, 98.11% of 2168.50MB total
Dropped 89 nodes (cum <= 10.84MB)
      flat  flat%   sum%        cum   cum%
  558.05MB 25.73% 25.73%   558.05MB 25.73%  path.Join
  523.60MB 24.15% 49.88%   523.60MB 24.15%  github.com/rclone/rclone/vfs.newFile
  447.54MB 20.64% 70.52%   447.54MB 20.64%  time.FixedZone (inline)
  388.55MB 17.92% 88.44%   388.55MB 17.92%  github.com/rclone/rclone/backend/pcloud.(*Fs).newObjectWithInfo
  135.40MB  6.24% 94.68%   659.58MB 30.42%  github.com/rclone/rclone/vfs.(*Dir)._readDirFromEntries
      54MB  2.49% 97.17%   502.04MB 23.15%  encoding/json.(*decodeState).literalStore
   11.16MB  0.51% 97.69%    23.67MB  1.09%  github.com/pkg/sftp.(*sshFxpNamePacket).marshalPacket
    4.59MB  0.21% 97.90%    28.26MB  1.30%  github.com/pkg/sftp.(*sshFxpNamePacket).MarshalBinary
    2.89MB  0.13% 98.03%   401.66MB 18.52%  github.com/rclone/rclone/vfs.(*DirHandle).Readdir
    1.68MB 0.077% 98.11%   398.76MB 18.39%  github.com/rclone/rclone/vfs.(*Dir).ReadDirAll
         0     0% 98.11%   519.19MB 23.94%  encoding/json.(*Decoder).Decode
         0     0% 98.11%   511.19MB 23.57%  encoding/json.(*decodeState).array
         0     0% 98.11%   511.19MB 23.57%  encoding/json.(*decodeState).object
         0     0% 98.11%   511.19MB 23.57%  encoding/json.(*decodeState).unmarshal
         0     0% 98.11%   511.19MB 23.57%  encoding/json.(*decodeState).value
         0     0% 98.11%  1358.83MB 62.66%  github.com/pkg/sftp.(*Request).call
         0     0% 98.11%   401.66MB 18.52%  github.com/pkg/sftp.(*Request).opendir
         0     0% 98.11%  1681.48MB 77.54%  github.com/pkg/sftp.(*RequestServer).Serve.func2.1
         0     0% 98.11%  1681.98MB 77.56%  github.com/pkg/sftp.(*RequestServer).packetWorker
         0     0% 98.11%    28.26MB  1.30%  github.com/pkg/sftp.(*conn).sendPacket
         0     0% 98.11%    28.26MB  1.30%  github.com/pkg/sftp.(*packetManager).controller
         0     0% 98.11%    28.26MB  1.30%  github.com/pkg/sftp.(*packetManager).maybeSendPackets
         0     0% 98.11%    12.50MB  0.58%  github.com/pkg/sftp.(*sshFxpNameAttr).MarshalBinary
         0     0% 98.11%  1352.83MB 62.39%  github.com/pkg/sftp.filestat
         0     0% 98.11%    28.26MB  1.30%  github.com/pkg/sftp.marshalPacket
         0     0% 98.11%    28.26MB  1.30%  github.com/pkg/sftp.sendPacket
         0     0% 98.11%  1306.35MB 60.24%  github.com/rclone/rclone/backend/pcloud.(*Fs).List
         0     0% 98.11%  1306.35MB 60.24%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll.func1
         0     0% 98.11%   787.16MB 36.30%  github.com/rclone/rclone/backend/pcloud.(*Fs).listAll.func2
         0     0% 98.11%  1306.35MB 60.24%  github.com/rclone/rclone/backend/pcloud.(*Fs).listHelper
         0     0% 98.11%   787.16MB 36.30%  github.com/rclone/rclone/backend/pcloud.(*Fs).listHelper.func1
         0     0% 98.11%   447.54MB 20.64%  github.com/rclone/rclone/backend/pcloud/api.(*Time).UnmarshalJSON
         0     0% 98.11%  2123.51MB 97.93%  github.com/rclone/rclone/cmd/serve/sftp.vfsHandler.Filelist
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/fs.pacerInvoker
         0     0% 98.11%  1306.35MB 60.24%  github.com/rclone/rclone/fs/list.DirSorted
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/lib/pacer.(*Pacer).Call
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/lib/pacer.(*Pacer).call
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/lib/rest.(*Client).CallJSON (inline)
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/lib/rest.(*Client).callCodec
         0     0% 98.11%   519.19MB 23.94%  github.com/rclone/rclone/lib/rest.DecodeJSON
         0     0% 98.11%  1721.86MB 79.40%  github.com/rclone/rclone/vfs.(*Dir).Stat
         0     0% 98.11%  1965.93MB 90.66%  github.com/rclone/rclone/vfs.(*Dir)._readDir
         0     0% 98.11%  1721.86MB 79.40%  github.com/rclone/rclone/vfs.(*Dir).stat
         0     0% 98.11%   153.01MB  7.06%  github.com/rclone/rclone/vfs.(*File).Path
         0     0% 98.11%  1721.86MB 79.40%  github.com/rclone/rclone/vfs.(*VFS).Stat
         0     0% 98.11%   153.01MB  7.06%  github.com/rclone/rclone/vfs.Nodes.Less
         0     0% 98.11%   153.01MB  7.06%  sort.Sort
         0     0% 98.11%       26MB  1.20%  sort.insertionSort
         0     0% 98.11%   121.51MB  5.60%  sort.partition
         0     0% 98.11%   153.01MB  7.06%  sort.pdqsort
         0     0% 98.11%   447.54MB 20.64%  time.Parse (inline)
         0     0% 98.11%   447.54MB 20.64%  time.parse

Edit3: I asked ChatGPT for a script to run the command every 5 minutes as long as rclone is running, which is what I'm using now:

#!/bin/bash

while true
do
  # Check if the rclone process is running
  if ! pgrep -x "rclone" > /dev/null; then
    echo "rclone process has been killed. Exiting script."
    break
  fi
  
  # Execute the go tool pprof command
  go tool pprof -text http://localhost:5572/debug/pprof/heap
  
  # Sleep for five minutes
  sleep 300
done

Its not that important. Wait until rclone has doubled its initial size, Or wait a few hours.

If you could make available the .gz file the pprof makes, eg /zfs/NAS/config/homedirs/christoph/pprof/pprof.rclone.alloc_objects.alloc_space.inuse_objects.inuse_space.005.pb.gz that has all the info in and I can run go tool pprof on that myself.

This looks very much like the memory use is associated with file objects. How many file objects are in use in the mount (roughly)?

can you run

rclone test memory pcloud:Backup/

This will show how much memory loading all the pcloud objects takes.

This is looking like memory use caused by lots of objects rather than a leak at the moment, but we'll see!

Did you see my PM?

Probably around 3.2 million (see here)

Will do asap.

Got the trace thanks.

Looking at it I can see there are exactly 3.2M VFS objects in use as you said.

I usually tell people that they need 1GB of RAM per 1,000,000 objects. Your pcloud objects are taking a bit less space than this so that looks fine.

So I think the memory use here is totally normal - if you want 3.2M objects in memory then you will need to have more RAM on the server with rclone v1.61

How much memory does the server have?

The problem is the objects stored in the directory cache which is something that is on my radar to reduce. I have a half done fix for it which stores the directory cache on disk instead which will improve things enormously.

The server has 24 GB of memory.

I tried this twice now, but everytime I get Failed to memory: couldn't list files: pcloud error: Internal error. Try again later. (5000) after some time...

Trying again right now, but I'm expecting it to fail again...

According to the profile Go was using 2.3GB of RAM. THis can often translate into more actual RAM through memory fragmentation. How much memory do you see rclone using in ps/taskmanager? Note that GOGC will help with this.

Using top | grep rclone I can see that it (RES) goes up to 2.9G. So I guess that's not a problem. Then again, it hasn't been killed yet. So I'll let my topcommand run for a while to see how much memory it consumes just before it gets killed.

Whoops: now it just went up to 3.5G...

I use this little script

#!/bin/sh
# Monitor memory usage of the given command

COMMAND="$1"

if [ "$COMMAND" = "" ]; then
   echo "Syntax: $0 command"
   exit 1
fi

headers=""
while [ 1 ]; do
    ps $headers -C "${COMMAND}" -o pid,rss,vsz,cmd
    sleep 1
    headers="--no-headers"
done

You call it "./memusage rclone" and it will print a line every second with the memory usage of the command you passed. Its good for tracking changes.

How much memory is free on the server? Would it be a problem if rclone used 4G? 8G?

That sounds pretty much like what my layman's oneliner did. In any case, rclone has meanwhile been killed and here are the last couple of lines from top | grep rclone

2796391 christo+  20   0 4642648   3.7g   3088 S  12.0  15.7   3:04.11 rclone
2796391 christo+  20   0 4642648   3.7g   3172 S  12.3  15.7   3:04.48 rclone
2796391 christo+  20   0 4642648   3.6g   2816 S   3.0  15.7   3:04.57 rclone
2796391 christo+  20   0 4642648   3.6g   2816 S  11.3  15.7   3:04.91 rclone
2796391 christo+  20   0 4642648   3.6g   3044 S  12.6  15.7   3:05.29 rclone
2796391 christo+  20   0 4642648   3.6g   3044 S   2.3  15.7   3:05.36 rclone
2796391 christo+  20   0 4642648   3.7g   3768 S  11.6  15.7   3:05.71 rclone
2796391 christo+  20   0 4642648   3.7g   3500 S   2.6  15.8   3:05.79 rclone
2796391 christo+  20   0 4711580   3.8g   2660 S  15.3  16.3   3:06.25 rclone
2796391 christo+  20   0 4711580   3.8g      0 S   4.1  16.3   3:06.38 rclone
2796391 christo+  20   0 4848420   3.9g      0 S  14.1  16.8   3:06.87 rclone
2796391 christo+  20   0 4848420   3.9g      0 S   2.7  16.8   3:07.00 rclone
2796391 christo+  20   0 4848420   3.9g      0 S   5.6  16.8   3:07.39 rclone

So, it went up to 3.9GB and perhaps it tried to go to 4GB which may have been a red line to cross...

Not sure what to make of the sudden drop of shared memory to 0...

I don't think so. I have no knowledge of these things, but there is nothing on that server that deeds so much RAM. The only reason why those GBs are in there is because I had them lying around. After a restart of the server, it takes quite a while (half an hour?) for the memory usage to reach the levels shown in the graph above. My interpretation is that the gradually starts handing out that free memory to processes that don't really need it but that might as well use it when no one is asking for it.

Your server has loads of free RAM, and page cache.

I think something must be killing rclone when it gets to 4GB

I suspect monit might be killing rclone.

Can you post your monit config file - it should be at /etc/monit.conf or /etc/monit/monitrc I think.

$ sudo cat /etc/monit/monitrc
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

set daemon 30 with start delay 30
set logfile syslog facility log_daemon
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state

set httpd unixsocket /run/monit.sock
    allow localhost

set eventqueue
    basedir /var/lib/monit/events
    slots 100

include /etc/monit/conf.d/*

$ sudo cat /etc/monit/conf.d/*
check process collectd with matching collectd
    start program = "/bin/systemctl start collectd"
    stop program = "/bin/systemctl stop collectd"
    mode active
check process omv-engined with pidfile /run/omv-engined.pid
    start program = "/bin/systemctl start openmediavault-engined"
    stop program = "/bin/systemctl stop openmediavault-engined"
    mode active
# Alert if disk space of root filesystem gets low
check filesystem rootfs with path /
    if space usage > 85% for 5 times within 15 cycles
        then alert else if succeeded for 10 cycles then alert


check process nginx with pidfile /run/nginx.pid
    start program = "/bin/systemctl start nginx"
    stop program = "/bin/systemctl stop nginx"
    mode active
    if cpu is greater than 40% for 2 cycles then alert
    if cpu is greater than 80% for 5 cycles then restart
    # https://mmonit.com/monit/documentation/monit.html#CONNECTION-TESTS
    # https://mmonit.com/monit/documentation/monit.html#FAULT-TOLERANCE
    if failed host 127.0.0.1 port 88 protocol http timeout 15 seconds for 2 times within 3 cycles then restart

check process php-fpm with pidfile /run/php/php7.4-fpm.pid
    start program = "/bin/systemctl start php7.4-fpm"
    stop program = "/bin/systemctl stop php7.4-fpm"
    mode active
check process rrdcached with pidfile /run/rrdcached.pid
    start program = "/bin/systemctl start rrdcached"
    stop program = "/bin/systemctl stop rrdcached"
    mode active
check system $HOST
    if loadavg (1min) > 4.0 for 3 cycles then alert
    if loadavg (5min) > 2.0 for 3 cycles then alert
    if memory usage > 90% then alert
    if cpu usage (user) > 95% for 2 cycles then alert
    if cpu usage (system) > 95% for 2 cycles then alert
    if cpu usage (wait) > 95% for 2 cycles then alert

I don't see anything obviously wrong with that.

Might there be a ulimit set on rclone?

Hm, we might be getting somewhere:

$ ps aux | grep rclone
christo+  139918  2.6  0.2 761996 68960 pts/3    Sl   18:14   0:00 rclone serve sftp pcloud:Backup/ --addr :2022 --user ********* --pass ********** --log-file=/zfs/NAS/config/rclone/rclone.log --vfs-cache-mode writes --rc
christo+  140271  0.0  0.0   6216   636 pts/3    S+   18:14   0:00 grep rclone

$ cat /proc/139918/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             95063                95063                processes
Max open files            1048576              1048576              files
Max locked memory         3122961920           3122961920           bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       95063                95063                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

3122961920 Bytes is actlually just 2.9085 Gigabytes...

But where does that limit come from? Could it be some system default?

Type

ulimit -a

On the user running rclone.