Rclone Mount Memory Leaks & GDrive Enumeration

Oh, updatedb - that is an old chestnut!

I don't think rclone should be increasing in memory size forever so if it there is a memory leak. I think updatedb just looks at every file on the disk and doesn't read it. Or maybe it does - it is certainly very hard on the disks.

Can you reproduce this with, lets say find - if you run that does it increase in memory each time?

1 Like

I have servers with 30 days or more of uptime, and they use almost no RAM at all. If rclone had a memory leak, I'd have hit it eventually.

I do heavy reads 24/7 on the mounts

Nick I am busy running this search but it's already been going for longer than 24 hours due to the size. I will confirm the final results once it's done but what I can report back on sofar:

  • Memory growth is 100% inline with what was seen under the updatedb enumeration. It's currently standing at 2.3GB allocated. For at least the first 18 hours after starting the search there was very little growth in RAM allocated and then its seems to hit a threshold where it starts increasing quickly from thereon.
  • The longer the search continues the more processes rclone spawns - its currently up to 23
  • Go is cleaning up memory after itself. Its own memory usage hasn't increased beyond 670MB

I want to see:

  • If the RAM usage will topout somewhere
  • Once the search completes if the allocated memory blocks are released back to the system (and how long it takes). From what I have seen in previous tests rclone will hold onto the memory blocks even after completing its tasks until the system is rebooted.

Hi random404. Do you care to share your rclone mount command parameters so I can compare what it looks like and where it differs? I have some mounts on Ubuntu servers as well and those tend to grow up to about 3.3GB and then stays there. It's not traversing any directories but it is uploading large files (2-20GB) to the same directory the whole time so I am assuming that it's the file size that's causing the growth. I have not really done any extensive troubleshooting on those as there's 196GB RAM on the servers so the 3.3 is not really impacting me too much. If I can get it down though it will be great.

If you are uploading files with the mount then it's normal that it will use more RAM as the chunks will be on ram...

I don't write to my mounts, so I just need to tweak with buffer size to control RAM usage and chunks size...

Maybe try https://rclone.org/drive/#drive-chunk-size

Rclone shouldn't be spawning any new processes. It may spawn new threads though - is that what you are seeing? If it has got to 23 that is probably a sign that there is a thread leak.

If you are running with --rc you can just run the commands here: https://rclone.org/rc/#debugging-go-routine-leaks and it will show you the goroutines. Can you paste them for me to see somewhere?

Nick you are correct - 23 threads and not processes. I just looked at unique PIDs and assumed them to be new process. After checking though I can confirm they are threads:
pstree -pau -l -G -s 1500
systemd,1 splash
└─rclone,1500,xxx mount --config=/xxx/rclone.conf --rc --allow-other --checksum --buffer-size 30M --use-mmap --fast-list --cache-dir /tmp --vfs-cache-mode writes --drive-chunk-size 16M --attr-timeout 30s --drive-export-formats link.html --drive-use-trash=true --drive-alternate-export=true --drive-acknowledge-abuse=true --log-level DEBUG --syslog gdrive: /xxx
├─{rclone},1520
├─{rclone},1521
├─{rclone},1522
├─{rclone},1523
├─{rclone},1524
├─{rclone},1569
├─{rclone},1570
├─{rclone},1681
├─{rclone},1682
├─{rclone},1683
├─{rclone},1729
├─{rclone},1733
├─{rclone},1765
├─{rclone},2957
├─{rclone},2958
├─{rclone},2959
├─{rclone},2960
├─{rclone},2998
├─{rclone},6005
├─{rclone},7822
├─{rclone},8017
└─{rclone},9304

I executed the commands you asked for. I couldn't attach the file but here is a link to grab the .zip.

Great - thanks for confirming.

Got it.

There doesn't appear to be any goroutine leaks - you've just got a busy mount.

The memory trace is interesting

File: rclone
Type: inuse_space
Time: May 26, 2020 at 2:25am (SAST)
Showing nodes accounting for 2614.89MB, 99.75% of 2621.34MB total
Dropped 45 nodes (cum <= 13.11MB)
      flat  flat%   sum%        cum   cum%
  766.70MB 29.25% 29.25%   766.70MB 29.25%  strings.(*Builder).grow
  514.09MB 19.61% 48.86%   514.09MB 19.61%  github.com/rclone/rclone/vfs.newFile
  392.51MB 14.97% 63.83%   392.51MB 14.97%  encoding/json.(*decodeState).literalStore
  325.54MB 12.42% 76.25%   586.56MB 22.38%  github.com/rclone/rclone/backend/drive.(*Fs).newRegularObject
  261.02MB  9.96% 86.21%   261.02MB  9.96%  fmt.Sprintf
  218.98MB  8.35% 94.56%   793.58MB 30.27%  github.com/rclone/rclone/vfs.(*Dir)._readDirFromEntries
   60.51MB  2.31% 96.87%    60.51MB  2.31%  github.com/rclone/rclone/vfs.newDir

What it looks like is that you've got a lot of VFS objects in memory.

How many files do you have in your mount? (rclone size remote:)

I guess updatedb has pulled the metadata for all of them into memory - that is why it is using so much memory. You can reduce

  --dir-cache-time duration                Time to cache directory entries for. (default 5m0s)

To make rclone get rid of those directory entries quicker. Though I think (looking at your command line) that you have it at the default 5 minutes already - is that correct?

So maybe the VFS layer isn't pruning its directory cache properly...

I'm not quite sure exactly where all the memory come from but some of that usage doesn't look very efficient!

Can you do

go tool pprof -svg http://localhost:5572/debug/pprof/heap

And post the generated svg file - that should show the trace of where the memory got used. That will also generate a .gz file - if you could stick that in the archive too then I can run my own analyses - thanks.

Busy doing the size scan but it will take some time - >million. I will report back as soon as I have a figure.

updatedb has been completely disabled as i prefer using alternative methods to using locate for file searches. So whatever memory has been consumed in this test is exclusively due to running a manual find against the mount.

Yes, I am just using the default 5 minutes for --dir-cache-time as I am not explicitly setting another value. That being said what I am explicitly setting is the --cache-dir to /tmp which of course means that every time the system is rebooted it cleans out all the cache files. I specifically did it this way to keep the system clean and force a full fresh read of the directory structure after reboots but on second thought it may be part of the issue as rclone has to rebuild that cache every time for a rather large mount point. If the mount point was smaller it would not have been an issue but I will make a change to this and see if there is any noticeable improvements over time.

I had to restart the machine as the find was running for 3 days already and I had to make changes to the hardware. Will start another find over the weekend and let it run and then do the dump to give more accurate results given the problem.

> 1 million is something to go on - thanks!

Let's say that you have 1 million files, if each file in memory used 2k of RAM that would be a about the 2GB of memory that you are seeking...

I did a quick test myself - I made 1 million files locally and mounted them. vfs/refresh used 880M of memory and doing a find used 1.8GB of memory.

So maybe what rclone needs here is to be limiting the total size of the in memory vfs cache...

Though I'm not sure why you've got so many directory entries with a 5 minutes cache timeout so that is a mystery which needs solving.

OK. Running rclone rc vfs/refresh recursive=true will have much the same effect (filling up the vfs cache) and will be much quicker than running find.

I don't think you are using the cache backend - the vfs layer doesn't store any metadata in the cache directory (yet) so I don't think this will be the problem.

:+1:

Total objects: 3467498

Honestly the search function in GDrive webui is the fastest way of finding anything so I won't use the mount to find stuff.
The mount is a convenience factor to be able to get direct access with local applications to individual files and folders to complete work without having to first download it through the webui, then work on it and then re-upload it.
The other use is to navigate the Drive folders with the much easier to use features/functions and tree structure in Dolphin compared to the webui (especially if you have a deeply nested structure then the webui sucks).
Everything else can be done directly in the webui itself.

So I am trying to get the rclone parameters tuned optimally for the above use case :slight_smile: and without the huge memory impact. Turning updatedb off made all the difference with the RAM usage as rclone is now only fetching what it needs and not the entire tree structure any longer. I do agree with you that using rclone with a large data store does seem to be having some unexpected quirks which can hopefully be eliminated with fine tuning so I will keep on testing different configurations accordingly.

A second use case is for rclone to use a mount for real-time syncing of large/many files to Drive instead of it requiring a lot of local storage space first and then doing a rclone sync to Drive. This is a compelling option for VMs as you can keep the VM storage requirements down but still have seamless access to petabyte connected storage. The RAM usage is again an issue here as you don't want to have to give each VM a huge chunk of RAM just so that rclone can do its thing. So if there is a way to cut the RAM footprint whilst still maintaining the use case requirements then it will be ideal.

So does using rclone without updatedb bring the memory usage back down to acceptable?

How many files will the VMs be accessing at any one time do you think? I guess that is the limiting factor provided the directory cache expiry is working....

Actually I just realised where the problem is! The VFS directory cache marks directories as expired when the --dir-cache time expires, but it doesn't remove them from memory as it re-uses the objects in them when the directory is refreshed - that is why your memory usage was never going down.

So what I could do is every now and again, run through the directory tree and prune directories which have expired (or maybe at some multiple of the expiry time). Maybe I should only do that when the tree has more than 100,000 entries say.

Hmm, what do you think?

Absolutely! It hovers around the 130,000K mapped memory mark and stays there most of the time with normal work file usage. As long as no enumeration of multiple levels of directories takes place then its great.

Difficult to say anything from 10-1000 on average I'd say. File sizes will vary between 200MB-18GB. A system that has been running in excess of 1 year with this setup is currently up to 3.4GB mapped memory for rclone. It seems to have topped out there and now just permanently uses that RAM.

This would explain something I definitely noticed. A freshly booted system will use about 137000K and will then about double during standard use. The problem is it never frees the memory again after getting it even after having been idle for hours. I would have assumed it to be returned to the system after a fair amount of timeout. This situation is true for both a scenario where several directories were accessed to work with files as well as where only a single directory was accessed and multiple large files added to that directory once a day.
So it's not just about the expired directories holding on but also expired file locks it seems.

How expensive is this transaction to the system (CPU, I/O access)? If the hit is low then do it more frequently as it will keep the tree nice and fresh and obviously clear its memory footprint regularly. If the hit is moderately high then rather wait for a higher count as per your suggestion. EIther way it will definitely make an improvement to the current situation :slight_smile:

Great!

Go isn't brilliant at returning memory to the system. Under Linux it marks it as no longer in use, then the kernel can reclaim it if it is under memory pressure.

You can see if this is the case by running the memory profiling - go tells you how much memory it is using and you can compare that to the system usage.

Filling up the directory cache is quite expensive in terms of round trips to the cloud provider

However there is a flag for expiry time so I thing rclone really should be ditching the memory and users can increase that flag if they want.

This would be really helpful. My mount just got killed by the kernel due to the system being out of memory. It was using 8 GB of memory with ~ 2 million objects, most of which were not being accessed frequently.

Do you think setting GOGC to a lower value would help with this in the meanwhile?

Oops! Can you rplease make a new issue on github about this so we don't forget please!

It might help a little bit but I wouldn't expect wonders!

Done. https://github.com/rclone/rclone/issues/4319

1 Like

Is it possible to run rclone mount with mmap?

You can, but it will only affect the buffers used for transfers.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.