Can you tell me how the overall memory needs are calculated?
Suppose I have a large number of very large files (say 1TB or more) to be transferred. Would memory need = transfers * b2-upload-concurrency * b2-chunk-size?
Or would it be twice that (in view of crypt)? Or more?
This formula is exactly right transfers * b2-upload-concurrency * b2-chunk-size. The encryption adds very little overhead - perhaps 64k per transfer.
Previously in the b2 backend (and not any other backend) the formula was just transfers * b2-chunk-size which explains why b2 is using so much memory in v1.64.
I'm currently doing a test to work out what the new default serttings should be. While I was doing that I found a deadlock in the code (perils of coding with lots of threads).
And here is a fix for that - important if you set b2-concurrency above 10 (the default is 16!). Since I added this flag for v1.64 I feel that changing the default in v1.64.1 is probably a good idea!
Would it be possible to add some protection in the software that calculates the needs in RAM, and compares that to the available RAM? And then, if risk of OOM, warn or quit or adapt the settings. Shouldn't be too hard to do, and would avoid upgrade pains.
Sorry to butt in, but wanted to share a couple of thoughts on this.
Adaptive memory management might not help since RAM usage checks are very unreliable.
I would recommend 2 alternatives:
An "auto-tune" feature that checks total installed RAM, and suggests config parameters for various backends. This could be done by anyone who knows rclone well, and doesn't need to be part of rclone. It could be a page on the website really.
(Much better imho): avoid chunks in RAM. There probably isn't a very good reason to preload them into RAM before uploading. They could be directly streamed from disk (e.g. using pread function, depending on how low level we want to get), and I doubt in most cases this would slow down uploads.
Auto tune is kind of what I was suggesting, but getting rclone to auto tune the memory buffer parameters.
Unfortunately buffering the chunks in RAM is quite necessary!
Some backends like to make checksums of the chunk before uploading it. So the data is read twice or sometimes 3 times (s3!).
All backends keep the chunk around in case of network unreliability so it can be retried on error.
These uses could be made to work without memory buffers but they would involve reading the source data multiple times. This might be acceptable when reading from a local disk, but probably isn't acceptable when reading over the network.
It might be worth having a --low-memory flag to enable this though with lots of caveats in the docs.
Some thoughts, but don't mean to waste your time, just in case you're interested in having this discussion:
To me it seems more reasonable that the default is direct disk access. For cases where disks are expensive/slow (old HDD, tape drive, etc) there could be a flag like --slow-disk.
IMHO, it's okay to read from disk 2-3 times amidst big uploads, if the payoff is never loading chunks into RAM (everything is done via streams). Seems like even Go hashers (for calculating checksums and such) Go implement io.Writer, which as far as I understand, allows stream-calculating the hash, rather than passing the whole chunk.
For retries, and avoiding reading network source multiple times, chunks could be stored in temp dir instead of RAM.
For example, all file manipulations could be done via an interface that has 2 implementations: disk-based, and RAM-based. Then the --slow-disk flag would choose appropriate implementation. Just some thoughts.
We spent some time re-organizing the multithread uploads so it would be relatively easy to do this now.
The interface that rclone uses internally for multithread uploads is a io.ReadSeeker. This is implemented with a memory buffer, but it could fairly easily be implemented with a raw rclone file handle. So when rclone was in a low memory mode, it would just use the file handles directly. This would be fine for reading from disk I think, but its rather unfriendly reading from network multiple times so this might need to be stored to disk which is a pain!
I'd be reluctant to make this the default without some performance testing though!
The first thing that would need to be done is to make operations.Open return a handle that could seek.
If I understand correctly, to avoid multiple network reads, rclone caches chunks in RAM. Would it be any extra pain to do the exact same caching, but to the filesystem? As we grab chunks from the network, we'd stream them directly into temp dir, and work on them there.
Perhaps you're saying that unlike io.ReadSeeker, there's no common interface for writes?