Rclone and CentOS 8

I have noticed a weird issue with rclone and CentOS 8.

I have a server with a rclone mounted to a b2 drive. Occasionally throughout the day, we copy files into that directory.

On my CentOS 6/7 boxes, it works rock solid. On the CentOS 8 box, it will randomly freeze. My logfile set to info did not show anything, I am waiting for it to happen again with debug.

When I do a ls -l /directory it freezes, when I do a df -h, it freezes. I have a third directory with a symlink and that also freezes. I cannot unmount the drive and can only reboot.

Clearly I need a log to provide but here is my mount command

rclone mount remote:connection /directory --allow-other --max-read-
ahead 200M --dir-cache-time 5m --acd-templink-threshold 0 --bwlimit 0 --checkers
32 --low-level-retries 1 --stats 0 --timeout 30s --vfs-cache-mode writes --log-
file=/var/log/rclone.txt --log-level DEBUG --b2-disable-checksum

Any suggestions on where to look?

hello and welcome to the forum,

yes, you need to provide the log.

what version of rclone are you using?
make sure you are using the latest version
v1.51.0

rclone v1.51.0

  • os/arch: linux/amd64
  • go version: go1.13.7

This is the end of the logfile. I am waiting for it to happen again with DEBUG enabled

2020/02/21 17:11:28 INFO : 202002/01_S_00019249_01_T_0001.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019249_01_T_0002.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019252_01_G_0001.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019249_01_G_0001.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019252_01_G_0003.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019252_01_G_0002.jpg: Removed from cache
2020/02/21 17:11:28 INFO : 202002/01_S_00019249_01_T_0003.jpg: Removed from cache
2020/02/21 17:11:28 INFO : Cleaned the cache: objects 8 (was 15), total size 6.532M (was 9.500M)
2020/02/21 17:12:28 INFO : Cleaned the cache: objects 8 (was 8), total size 2.901M (was 6.532M)
2020/02/21 17:13:28 INFO : Cleaned the cache: objects 8 (was 8), total size 2.901M (was 2.901M)
2020/02/21 17:14:28 INFO : Cleaned the cache: objects 8 (was 8), total size 2.901M (was 2.901M)
2020/02/21 17:15:28 INFO : Cleaned the cache: objects 8 (was 8), total size 2.901M (was 2.901M)

good, now that we got the basics out of the way.

let's wait and see what happens...

So it happened again this AM

Debug log just before the reboot
2020/02/22 09:30:24 INFO : 202002/01_S_00019261_05_T_0001.jpg: Removed from cac
he
2020/02/22 09:30:24 INFO : 202002/01_S_00019264_01_G_0003.jpg: Removed from cac
he
2020/02/22 09:30:24 INFO : 202002/01_S_00019261_05_T_0002.jpg: Removed from cac
he
2020/02/22 09:30:24 INFO : 202002/01_S_00019264_01_G_0002.jpg: Removed from cac
he
2020/02/22 09:30:24 INFO : 202002/01_S_00019263_01_G_0001.jpg: Removed from cac
he
2020/02/22 09:30:24 INFO : Cleaned the cache: objects 33 (was 39), total size 1
9.622M (was 21.304M)
2020/02/22 09:31:01 DEBUG : /: Attr:
2020/02/22 09:31:01 DEBUG : /: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x
, err=
2020/02/22 09:31:01 DEBUG : : Statfs:
2020/02/22 09:31:01 DEBUG : : >Statfs: stat={Blocks:274877906944 Bfree:274877906
944 Bavail:274877906944 Files:1000000000 Ffree:1000000000 Bsize:4096 Namelen:255
Frsize:4096}, err=
2020/02/22 09:31:24 INFO : Cleaned the cache: objects 33 (was 33), total size 1
6.284M (was 19.622M)
2020/02/22 09:31:43 DEBUG : /: Attr:
2020/02/22 09:31:43 DEBUG : /: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x
, err=
2020/02/22 09:31:43 DEBUG : : Re-reading directory (9m7.2917874s old)
2020/02/22 09:32:24 INFO : 202002/01_S_00019261_01_G_0001.jpg: Removed from cac
he
2020/02/22 09:32:24 INFO : Cleaned the cache: objects 32 (was 33), total size 1
6.284M (was 16.284M)
2020/02/22 09:33:24 INFO : 202002/01_S_00019263_01_T_0003.jpg: Removed from cac
he
2020/02/22 09:33:24 INFO : 202002/01_S_00019263_01_T_0001.jpg: Removed from cac
he
2020/02/22 09:33:24 INFO : 202002/01_S_00019263_01_T_0002.jpg: Removed from cac
he
2020/02/22 09:33:24 INFO : Cleaned the cache: objects 29 (was 32), total size 1
6.021M (was 16.284M)
2020/02/22 09:34:24 INFO : Cleaned the cache: objects 29 (was 29), total size 1
4.306M (was 16.021M)
2020/02/22 09:35:24 INFO : Cleaned the cache: objects 29 (was 29), total size 1
4.306M (was 14.306M)
2020/02/22 09:36:24 INFO : Cleaned the cache: objects 29 (was 29), total size 1
4.306M (was 14.306M)
2020/02/22 09:37:24 INFO : Cleaned the cache: objects 29 (was 29), total size 1
4.306M (was 14.306M)
2020/02/22 09:38:24 INFO : Cleaned the cache: objects 29 (was 29), total size 1
4.306M (was 14.306M)

Had to reboot the server to recover.

how much ram is in the centos 6/7 box?
how much ram is in the centos 8 box?

why not kill the rclone process?
why do yo have reboot the server?

i would remove as many flags as possible as test again.

why are you using?
--allow-non-empty
and
--checkers 32
and
--acd-templink-threshold - i think that for for amazon drive

these are the valid flags for b2
https://rclone.org/b2/

al 6/7/8 boxes have 2GB of RAM

Honestly I don't have a good answer on that. Customer was down and it was the quickest thing I could do to get them back up again.

so you have only 2GB.
perhaps this is not a good idea
--max-read-ahead 200M?

this flag seems to do nothing as the default value is 0.
--bwlimit 0

i would remove as many flags as possible and test again.
make sure you need each flag

I will try and and report back on Monday.

It never seems to crash when it is idle, or when active.

I appreciate your help and quick response

There isn't anything suspicious I can see in that log :frowning:

If you can catch it when it is locked up do kill -QUIT pid-of-rclone and rclone will create a backtrace showing exactly what it is up to.

It would be worth trying to fusermount -z -u /path/to/mountpoint and restarting rclone next time it happens.

I will try and do a kill -QUIT next time around. It is a production machine so time is a bit of the essence. Also, just to note, I have the exact same config with the exact same VPS environment. The only difference is CentOS 8 vs 7

Right now, the only thing I've been able to do is to simply do a pkill -9 rclone and then re-mound as a script. It runs every 10 minutes. Very dirty I know.

Great

That kind of implies it is a kernel problem, but I can't think what it would be.