Suddenly seeing very high memory usage

What is the problem you are having with rclone?

Suddenly high memory usage.

Historically sync memory usage was 40-50% of the system memory. Now it is hitting 100% and rclone will exit without finishing.

The system running rclone is dedicated to this function. It is Ubuntu 20.04 Server, no GUI, with 32GiB RAM.

The source has 30M files, which is quite large, but has been working well for us for many months. We are not sure why we are seeing such high memory usage now.

We normally use 32 checkers, 32 transfers, and 32 azure blob concurrency but we scaled that down today to see if it would help. (It did not.)

Run the command 'rclone version' and share the full output of the command.

rclone v1.66.0

  • os/version: ubuntu 20.04 (64 bit)
  • os/kernel: 5.4.0-182-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.22.1
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Source - SMB
Destination - Azure Blob

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --verbose --stats-log-level NOTICE --stats=1h --bwlimit "60M" --retries 1 \
  --track-renames --track-renames-strategy modtime \
  --checkers 32 --transfers 32 --azureblob-upload-concurrency 32 \
  {source-SMB} {destination-AzureBlob}

We have tried reducing checkers/transfers/concurrency values multiple times and rerun the command, but we still get out of memory condition. We got so far as:

--checkers 4 --transfers 4 --azureblob-upload-concurrency 2

But the problem remained.

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[local-server-name]
type = smb
host = XXX
user = XXX
pass = XXX
domain = XXX

[blob-storage-account-name]
type = azureblob
account = XXX
key = XXX

A log from the command that you were trying to run with the -vv flag

Seems OK but I think the OS eventually kills the process. Here is output from our script that runs rclone:

rclone-script.sh: line 93: 88237 Killed                  rclone sync --verbose --stats-log-level NOTICE --stats=1h --bwlimit "60M" --retries 1 --track-renames --track-renames-strategy modtime --checkers 32 --transfers 32 --azureblob-upload-concurrency 32 {source-SMB} {destination-AzureBlob}

rclone exit code: 137

did you recently upgrade rclone?

No, we upgraded to 1.66 a day or two after its release, so more than 2 months.

Even with 1 checker it runs out of memory.

Guessing it's the track-renames options that are doing us in. But I don't really understand, since we've been using it for months. The number of objects has slowly been increasing, but nothing like 2x in the last 24 hours. So it's odd that memory usage has more than doubled in that timeframe.

can you post a rclone debug or some set of details?

The full debug is too enormous (and most of it is successful processing), but I updated the original post with what I saw when rclone was being run by our script. The last line will say "Killed" and rclone exits with error code 137. I'm guessing it was killed by the kernel/OS due to memory exhaustion.

Edit to add:

Now that I think about it, it's probably rclone that ended itself since it set an exit code. The "Killed" message is probably because it was a background process (& at the end) and the script was doing a wait for it to finish. Also if the OS killed the process I don't know what the exit code would be.

Have you tried using the --use-mmap flag? It helps with memory usage and I've found it to be pretty stable.

Rclone OOM killed during sync files check error 137
Exit code =137?

No, I wasn't aware of that option. I'll give it a shot!

Unfortunately this didn't seem to change anything. After a few minutes the memory on the machine is exhausted.

You'd probably want to grab this before it blows up to see what's going on:

Remote Control / API (rclone.org)

If it worked before, I wonder if you've crossed some threshold of objects over time that is causing the issue.

Does that work if I'm not running rclone in "rc" mode?

And yeah, crossing some threshold seems logical on the surface, but we have only incrementally been increasing file count over time. Suddenly RAM usage more than doubled.

Edit to add: never mind, looked and it does require "rc" mode. I'll have to rework some things to test in that mode. But thanks for the tip.

That's generally how we can see if it's a leak or just unfortunate growth/size requirements.

Bit painful to run through but only real way to see in the guts of what's going on.

Does the memory required for the track-renames option increase based on the total file count, or file differences discovered during scanning?

I'm just wondering if this problem "suddenly appeared" because too many files got renamed on the source side.

Anyone able to answer the above questions? Curious if there is a way to project the memory requirements for the track-renames option.

I do think this was the root issue - a high level folder was possibly renamed that contains millions of files, which ballooned the memory needs for tracking renames.

fwiw, can do some simple testing using --dry-run and then should have your answer.

Not sure I follow. I would guess --dry-run will consume memory similarly to a production run, with regards to the track-renames feature. So I would expect it to crash at some point as system memory is exhausted.

What I'm hoping to find out is some statistic like for every 1000 files renamed we need xx MB of RAM (or whatever).

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.