I'm not quite sure it will help since the issue is not only serialising writes but also limit their concurrency, AFAICS
--buffer helps on huge files in the scenario above but on small files for better network utilisation I have to set higher concurrency for transfers with
--transfers=64 (actual value for 128M files I'm using with macOS sparsebundle images, btw) and since transfer is a synchronous operation it could cause up to 64 simultaneous writes to local disk. While on SSDs it works fine (I've already checked) on rotational drives it's dramatically slow.
Btw, I have a kinda workaround for that which helps a bit: to set
--buffer-size=<higher value for file size> --multi-thread-cutoff=<lower value than file size> --multi-thread-streams=<something about 4-6-8> to increase RAM usage for file construction before write.
I could play a bit with
local backend but I'm almost sure it's not about implementation only it's mainly about architecture and separating writes from reads.
I'm not quite understand how mutex could help here but please give me some time to look at the code of rclone, since I haven't checked out how it's implemented yet starting from architecture proposal based on observable rclone behaviour.