High CPU load with rclone mount

What is the problem you are having with rclone?

I'm unsing rclone mount with full cache mode with onedrive or box.
In both case when I launch the transfers (lots of small files) with the app, I'm getting a very high CPU load.

Here is the result of the top command :

top - 18:17:43 up 6 days,  6:08,  1 user,  load average: 2296,54, 2293,33, 2221,54
Tâches: 163 total,   1 en cours, 114 en veille,   0 arrêté,   0 zombie
%Cpu(s):  0,2 ut,  0,2 sy,  0,0 ni, 99,5 id,  0,1 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem :  3918428 total,   129396 libr,  3411104 util,   377928 tamp/cache
KiB Éch:  1402264 total,  1394328 libr,     7936 util.  2292700 dispo Mem

Here is the list of the threads :

root      1105     1  1105  0   17 juin26 ?       00:01:36 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1147  0   17 juin26 ?       00:03:41 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1149  0   17 juin26 ?       00:00:00 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1150  0   17 juin26 ?       00:01:37 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1151  0   17 juin26 ?       00:02:18 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1160  0   17 juin26 ?       00:02:25 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1174  0   17 juin26 ?       00:00:00 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1177  0   17 juin26 ?       00:02:53 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1180  0   17 juin26 ?       00:02:45 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1182  0   17 juin26 ?       00:02:49 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1253  0   17 juin26 ?       00:00:00 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1255  0   17 juin26 ?       00:02:42 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1256  0   17 juin26 ?       00:03:03 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1389  0   17 juin26 ?       00:02:33 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1  1509  0   17 juin26 ?       00:02:49 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1 17677  0   17 14:27 ?        00:02:00 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/
root      1105     1 28286  1   17 14:56 ?        00:02:04 /usr/bin/rclone mount --config=/home/bob/.config/rclone/rclone.conf --tpslimit 4 -v --syslog --vfs-cache-mode full --vfs-cache-max-size 10000M --vfs-cache-max-age 96h0m0s --cache-dir /media/onedrive/cache onedrive:/ /media/onedrive/data/

What is your rclone version (output from rclone version)

rclone v1.51.0
- os/arch: linux/amd64
- go version: go1.13.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 18.04

Which cloud storage system are you using? (eg Google Drive)

box and/or onedrive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

I'm using an application that is copying a lot of small files

The rclone config contents with secrets removed.

[onedrive]
type = onedrive
token = {"access_token":"zzzzzzz","token_type":"Bearer","refresh_token":"zzzzzzzzz","expiry":"2020-07-02T18:44:53.015919184+02:00"}
drive_id = yyyyyyyyyyyyy
drive_type = business

A log from the command with the -vv flag

Jul  2 18:30:37 bob-bluelagoon rclone[1105]: files/7b/3agdyuypcv2e4ev55clwqmnyqiwlkv6g5j5tz3e62pvkq6k7ia.ext: Copied (new)
Jul  2 18:30:37 bob-bluelagoon rclone[1105]: temp/file-132304095: Renamed in cache
Jul  2 18:30:38 bob-bluelagoon rclone[1105]: temp/file-132304095: Moved (server side)
Jul  2 18:30:39 bob-bluelagoon rclone[1105]: temp/file-436523506: Renamed in cache
Jul  2 18:30:39 bob-bluelagoon rclone[1105]: temp/file-436523506: Moved (server side)
Jul  2 18:30:41 bob-bluelagoon rclone[1105]: temp/file-829565000: Copied (new)
Jul  2 18:30:42 bob-bluelagoon rclone[1105]: temp/file-770228537: Renamed in cache
Jul  2 18:30:42 bob-bluelagoon rclone[1105]: temp/file-918399751: Copied (new)
Jul  2 18:30:43 bob-bluelagoon rclone[1105]: temp/file-770228537: Moved (server side)
Jul  2 18:30:43 bob-bluelagoon rclone[1105]: temp/file-590410254: Renamed in cache
Jul  2 18:30:44 bob-bluelagoon rclone[1105]: temp/file-590410254: Moved (server side)
Jul  2 18:30:45 bob-bluelagoon rclone[1105]: temp/file-151927062: Renamed in cache
Jul  2 18:30:46 bob-bluelagoon rclone[1105]: temp/file-151927062: Moved (server side)
Jul  2 18:30:46 bob-bluelagoon rclone[1105]: temp/file-957711392: Renamed in cache
Jul  2 18:30:46 bob-bluelagoon rclone[1105]: temp/file-957711392: Moved (server side)
Jul  2 18:30:47 bob-bluelagoon rclone[1105]: temp/file-575320739: Renamed in cache
Jul  2 18:30:48 bob-bluelagoon rclone[1105]: temp/file-575320739: Moved (server side)
Jul  2 18:30:49 bob-bluelagoon rclone[1105]: temp/file-127283350: Renamed in cache
Jul  2 18:30:50 bob-bluelagoon rclone[1105]: temp/file-127283350: Moved (server side)
Jul  2 18:30:50 bob-bluelagoon rclone[1105]: temp/file-759373890: Renamed in cache
Jul  2 18:30:51 bob-bluelagoon rclone[1105]: temp/file-759373890: Moved (server side)
Jul  2 18:30:52 bob-bluelagoon rclone[1105]: temp/file-074207418: Copied (new)
Jul  2 18:30:53 bob-bluelagoon rclone[1105]: temp/file-983550810: Renamed in cache
Jul  2 18:30:53 bob-bluelagoon rclone[1105]: temp/file-983550810: Moved (server side)
Jul  2 18:30:53 bob-bluelagoon rclone[1105]: temp/file-122816326: Renamed in cache
Jul  2 18:30:54 bob-bluelagoon rclone[1105]: Cleaned the cache: objects 5029 (was 5009), total size 2.260G (was 2.268G)
Jul  2 18:30:54 bob-bluelagoon rclone[1105]: temp/file-122816326: Moved (server side)
Jul  2 18:30:55 bob-bluelagoon rclone[1105]: temp/file-932213863: Renamed in cache
Jul  2 18:30:56 bob-bluelagoon rclone[1105]: temp/file-932213863: Moved (server side)
Jul  2 18:30:56 bob-bluelagoon rclone[1105]: temp/file-360419537: Copied (new)
Jul  2 18:30:57 bob-bluelagoon rclone[1105]: temp/file-929846077: Renamed in cache
Jul  2 18:30:58 bob-bluelagoon rclone[1105]: temp/file-929846077: Moved (server side)
Jul  2 18:30:58 bob-bluelagoon rclone[1105]: temp/file-225391100: Copied (new)
Jul  2 18:30:59 bob-bluelagoon rclone[1105]: temp/file-513863861: Renamed in cache
Jul  2 18:31:00 bob-bluelagoon rclone[1105]: temp/file-513863861: Moved (server side)
Jul  2 18:31:01 bob-bluelagoon rclone[1105]: temp/file-553733316: Renamed in cache
Jul  2 18:31:01 bob-bluelagoon rclone[1105]: temp/file-553733316: Moved (server side)
Jul  2 18:31:01 bob-bluelagoon rclone[1105]: temp/file-681166906: Renamed in cache
Jul  2 18:31:02 bob-bluelagoon rclone[1105]: temp/file-681166906: Moved (server side)
Jul  2 18:31:03 bob-bluelagoon rclone[1105]: temp/file-439083404: Renamed in cache
Jul  2 18:31:04 bob-bluelagoon rclone[1105]: temp/file-439083404: Moved (server side)
Jul  2 18:31:05 bob-bluelagoon rclone[1105]: temp/file-298171584: Renamed in cache
Jul  2 18:31:06 bob-bluelagoon rclone[1105]: temp/file-298171584: Moved (server side)
Jul  2 18:31:06 bob-bluelagoon rclone[1105]: temp/file-695390357: Copied (new)
Jul  2 18:31:07 bob-bluelagoon rclone[1105]: temp/file-273541688: Renamed in cache
Jul  2 18:31:08 bob-bluelagoon rclone[1105]: temp/file-273541688: Moved (server side)
Jul  2 18:31:08 bob-bluelagoon rclone[1105]: temp/file-489393963: Copied (new)
Jul  2 18:31:09 bob-bluelagoon rclone[1105]: temp/file-821081255: Renamed in cache

I'm not sure what you mean as your system is idle.

Wow!

Can you copy the files using rclone copy straight to the destination? You'll get much better performance than copying them through the mount.

Note also that in the latest beta we've re-done the --vfs-cache full mode to make it work much better.

Those numbers are not right @ncw for load average as the CPU usage is nil.

You can get a load average with no CPU usage in certain pathological cases.

Here is what the uptime man page has to say:

  System load averages is the average number of processes that are either
 in a runnable or uninterruptable state.  A process in a runnable  state
  is  either using the CPU or waiting to use the CPU.  A process in unin‐
  terruptable state is waiting for some I/O access, eg waiting for  disk.

So the processes could be in uninterruptable state. They might be stuck in the kernel with state D in ps output.

I've seen processes stuck in state D when rclone mount has gone wrong.

However you get an increase of 1 per process stuck in D state so to get 2296 processes means you need that many processes running.

There are only 163 running processes in the top result so this can't be the explanation.

So I think I'll have to go with your explanation that something has gone wrong with this system. Probably a reboot will fix it!

Hello @Animosity022 and @ncw,

Yes you are right, the CPU usage if very low but I have hundreds of threads trying to copy data that makes the CPU load average very high.
So I tried the beta version of rclone and this is not fixing the error. Of course I tried to reboot, this solves the problem for few minutes only. As soon as the application is sending the firsts files to the remote, the CPU load starts again to increase until the computer totally crash.

I am not able to use rclone copy, I need to mount the remote to be able to use it has a local file system.

Thank you for your help.

Why do you have to use a mount? You are overloading your system with too much copy commands or something then.

Because this is not data that I have to copy oneshot from one place to another one. Our application needs to get the data in live, we have to mount the remote.

Could you run your mount with --rc and post the result of this section?

https://rclone.org/rc/#debugging-go-routine-leaks

That will show us what the rclone threads are doing.

Hello @ncw and @Animosity022,

FYI I updated few days ago to the version v1.52.2-196-g4f7f5404-beta.
This seems to fix the issue of the high CPU load on my 30 servers.

But I installed new server with the last beta version (204) and the high CPU load came back again.
So I moved those servers to the version 196 and it is fine.

I know this is a beta version not tested yet, but for your info I have some issues with the 204.

Thanks

Can you have a go with the profiling tools here: https://rclone.org/rc/#debugging-rclone-with-pprof

I'd like to see a dump of go-routines when there is high CPU so this one curl http://localhost:5572/debug/pprof/goroutine?debug=1

And this one might be useful too: 30-second CPU profile: go tool pprof http://localhost:5572/debug/pprof/profile

Here is the results of the commands from a vps server with more than 1000 at the cpu load :

https://pastebin.com/jspTGwrQ

Thanks for the log. I can see what the problem is there! Try this

v1.52.2-207-g1fdcc195-vfs-beta on branch vfs (uploaded in 15-30 mins)

This is not solving the issue but there is a difference, now the CPU utilization is 100%, I mean CPU idle is 0%

Tasks: 196 total,   2 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us, 64.2 sy,  0.0 ni,  0.0 id, 29.8 wa,  0.0 hi,  2.3 si,  0.0 st
MiB Mem :    450.1 total,      4.7 free,    382.6 used,     62.9 buff/cache
MiB Swap:   1952.0 total,    804.9 free,   1147.1 used.     56.2 avail Mem

And here is json extract if this can help :

https://pastebin.com/NAz6k9Ek

This is the same problem just moved along a bit!

The ultimate problem is that rclone is listing a directory on onedrive which is taking a really long time.

While that directory is being listed everything using that directory is being held up.

Does the directory you are using have 1000s (or more) of files in?

Is rclone having a problem listing the directory? Is the first listing of the directory very slow?

I think you'll probably be able to work around the problem by increasing --dir-cache-time as I suspect it only happens when the directory cache expires.

I had another go at fixing the issue can you have a go with this?

v1.52.2-209-gca3fbdef-vfs-beta on branch vfs (uploaded in 15-30 mins)

Hello @ncw,

This issue seems to be fixed. I'm going to monitor it for few hours.
I added like you suggested the --dir-cache-time option with a 1h value.

Thank you very much for all your work. This is very useful.

Hello Nick,

How exactly works --dir-cache-time ?
If I put 200 hours, what happen ? Is it make a refresh only each 200h ?

In more, does this solution work also for box mount, because on some of my old server I encounter the same problem. I don't have them for the moment to make debug with you.

And when I discuss with sun14 he tell me the cpu load explode on small VPS with your new version, any explanation ?

Thanks for all.

Let me know how it goes

Yes that is right. You can use the RC call vfs/refresh to force an update.

Yes it should do.

I need to see a goroutine trace to say.