Each rclone transfer does its own disk read. I'm wondering if it could be more efficient to have a single thread do all the reads and fill buffers for each transfer.
i.e. imagine i'm doing 50 parallel transfers, I might say I want a 50MB buffer for each transfer (so I'm dedicating 2.5GB of memory to this).
so I set up 50 channels.
Each channel takes a message that contains data up to size block, so we make the channel size n = buffer_size / block_size.
each thread of transfer execution just reads form the channel.
the single buffer thread iterates over all transfers its processing and reads the length (l) of the channel and sees that we have n-l slots free so fills them, and proceeds to the next (when it finishes reading the file, it closes the channel, so the reader will see that)
idea is to reduce disk thrashing on large number of transfers in parallel, where we have large bandwidth but for whatever reason need large number of parallel threads to make use of it.
though in thinking about it, I wonder if this would be a positive or negative impact on using rclone as a file system mount. threads are obviously easy, just wondering if there's a smarter way to minimize thrashing. i.e. having a single IO thread (or at least a controllable number)
I don't think we can avoid the lots of disk IO if you do 50 transfers at once. Rclone will nead to read from 50 files at once so you are going to get lots of IO thrashing. Setting --buffer-size should help a bit but I'm not sure how much.
I'd try lowering --transfers a bit - if it is disk IO limited then that will make things more efficient but shouldn't affect network speed.
Which backend are you transferring to? Rclone may be precalculating hashes which is quite disk intensive.
I agree, we can't avoid doing lots of IO, but we can do it in a more intelligent manner than the OS's IO scheduler. or perhaps there's a way to instruct the OS's io scheduler to be more intelligent? (i.e. expect large streaming reads)
perhaps there's a way to use mmap and madvise / MAD_SEQUENTIAL (and perhaps MAD_DONTNEED after it's read?) to optimize this within the context of the linux io scheduler?