1.64.0 and latest beta 1.65.0 both crash on rclone crypt/sync B2 on large files

What is the problem you are having with rclone?

rclone crashes crypt/sync B2 when transferring large files (> 2GB). No error message is shown in logs, and return code = 0.

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.0-beta.7389.9e80d48b0

  • os/version: debian 11.7 (64 bit)
  • os/kernel: 6.1.0-0.deb11.11-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.1
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Backblaze B2 (in EU)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --password-command="pass B2/backup-bc" --fast-list --transfers 10 --b2-hard-delete -vv --stats-file-name-length 0 -x sync /srv/dev-disk-by-label-fs1/backup/ crypt:

The rclone config contents with secrets removed.

b2:
- type: b2
- account:@@@@
- key: @@@@
- hard_delete: true

crypt:
- type: crypt
- remote: b2:backup-bc
- password: *** ENCRYPTED ***
- password2: *** ENCRYPTED ***

A log from the command with the -vv flag

That's weird! Is that log is complete?

Something must have killed rclone. Can you check the kernel logs (dmesg) and see if it talks about rclone?

Did rclone run out of memory? I noticed today that rclone is using more memory than it should so try setting --b2-upload-concurrency 4

Indeed, good catch:

Sep 24 16:51:07 nas-omv kernel: [35457.604549] Out of memory: Killed process 13018 (rclone) total-vm:17094888kB, anon-rss:15642476kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:31156kB oom_score_adj:0

So yes, OOM. The machine has 16GB or RAM, and in rest uses only 230MB:

MiB Mem : 15993.6 total, 15048.6 free, 231.9 used, 713.0 buff/cache

So 16GB used only by rclone looks like a runaway process to me.

Now, setting

--b2-upload-concurrency 4

isn't that going contrary to the speed increase this whole rewrite is meant to provide?

The default in S3 is 4.

I need to adjust the defaults as the default concurrency is 16 and the block size is 100M which make a lot of memory.

Previous rclone's limited the total blocks in the B2 backend but we can't do that any more.

Can you tell me how the overall memory needs are calculated?
Suppose I have a large number of very large files (say 1TB or more) to be transferred. Would memory need = transfers * b2-upload-concurrency * b2-chunk-size?
Or would it be twice that (in view of crypt)? Or more?

This formula is exactly right transfers * b2-upload-concurrency * b2-chunk-size. The encryption adds very little overhead - perhaps 64k per transfer.

Previously in the b2 backend (and not any other backend) the formula was just transfers * b2-chunk-size which explains why b2 is using so much memory in v1.64.

I'm currently doing a test to work out what the new default serttings should be. While I was doing that I found a deadlock in the code (perils of coding with lots of threads).

And here is a fix for that - important if you set b2-concurrency above 10 (the default is 16!). Since I added this flag for v1.64 I feel that changing the default in v1.64.1 is probably a good idea!

v1.65.0-beta.7391.55c3c221b.fix-b2-upload-url-lock on branch fix-b2-upload-url-lock (uploaded in 15-30 mins)

Would it be possible to add some protection in the software that calculates the needs in RAM, and compares that to the available RAM? And then, if risk of OOM, warn or quit or adapt the settings. Shouldn't be too hard to do, and would avoid upgrade pains.

Yes, this is a nice idea to adaptively adjust the memory used by the multipart transfers.

Perhaps I could use this package GitHub - pbnjay/memory: A go function to report total system memory

Do you want to open a new issue on Github about this? If you do please put a link to this forum post.

Sorry to butt in, but wanted to share a couple of thoughts on this.

Adaptive memory management might not help since RAM usage checks are very unreliable.

I would recommend 2 alternatives:

  1. An "auto-tune" feature that checks total installed RAM, and suggests config parameters for various backends. This could be done by anyone who knows rclone well, and doesn't need to be part of rclone. It could be a page on the website really.
  2. (Much better imho): avoid chunks in RAM. There probably isn't a very good reason to preload them into RAM before uploading. They could be directly streamed from disk (e.g. using pread function, depending on how low level we want to get), and I doubt in most cases this would slow down uploads.

Auto tune is kind of what I was suggesting, but getting rclone to auto tune the memory buffer parameters.

Unfortunately buffering the chunks in RAM is quite necessary!

  • Some backends like to make checksums of the chunk before uploading it. So the data is read twice or sometimes 3 times (s3!).
  • All backends keep the chunk around in case of network unreliability so it can be retried on error.

These uses could be made to work without memory buffers but they would involve reading the source data multiple times. This might be acceptable when reading from a local disk, but probably isn't acceptable when reading over the network.

It might be worth having a --low-memory flag to enable this though with lots of caveats in the docs.

Some thoughts, but don't mean to waste your time, just in case you're interested in having this discussion:

To me it seems more reasonable that the default is direct disk access. For cases where disks are expensive/slow (old HDD, tape drive, etc) there could be a flag like --slow-disk.

IMHO, it's okay to read from disk 2-3 times amidst big uploads, if the payoff is never loading chunks into RAM (everything is done via streams). Seems like even Go hashers (for calculating checksums and such) Go implement io.Writer, which as far as I understand, allows stream-calculating the hash, rather than passing the whole chunk.

For retries, and avoiding reading network source multiple times, chunks could be stored in temp dir instead of RAM.

For example, all file manipulations could be done via an interface that has 2 implementations: disk-based, and RAM-based. Then the --slow-disk flag would choose appropriate implementation. Just some thoughts.

We spent some time re-organizing the multithread uploads so it would be relatively easy to do this now.

The interface that rclone uses internally for multithread uploads is a io.ReadSeeker. This is implemented with a memory buffer, but it could fairly easily be implemented with a raw rclone file handle. So when rclone was in a low memory mode, it would just use the file handles directly. This would be fine for reading from disk I think, but its rather unfriendly reading from network multiple times so this might need to be stored to disk which is a pain!

I'd be reluctant to make this the default without some performance testing though!

The first thing that would need to be done is to make operations.Open return a handle that could seek.

After that it would be very straight forward.

Do you want to make an issue about this?

1 Like

If I understand correctly, to avoid multiple network reads, rclone caches chunks in RAM. Would it be any extra pain to do the exact same caching, but to the filesystem? As we grab chunks from the network, we'd stream them directly into temp dir, and work on them there.

Perhaps you're saying that unlike io.ReadSeeker, there's no common interface for writes?

Let me know if this is adequate.

That is 100% correct.

Most of the time we don't actually need to read it twice for the b2 backend so that could just read it once unless a retry is needed.

Other backends like s3 do read it at least twice so caching them to disk seems like a good idea.

io.ReadSeeker is the important interface as far as rclone is concerned.

Perfect - thank you!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.