the current way the rclone mount cache is implemented suffers heavily from fragmentation. By writing out files in a sparse manner, one does a good job at not wasting space (or bandwidth) on unused portions of a file. However, it means the OS cannot write the files in an optimal manner. Now, this becomes a bigger problem when one is caching large files and one also deletes things from the cache. At least in linux, deleting heavily fragmented files can take a significant amount of time. My observation is, that when this happens, the whole rclone mount freezes (blocked in unlink).
I honestly think a wiser course of action would be to store the individual cached blocks as independent files with a suffix that defines which block they are. if one is using 16MB cache blocks, then for instance, file-0 would be the first 16MB, file-1 would be the 2nd, so on and so forth. (can throw in an extra magic string to try and avoid conflict (or perhaps even definable on the command line). One would also need to store metadata in the cache dir to make clear what the block size is (while it could perhaps be inferred, if all one has is the last block, it couldn't be) and one does not want to use the mechanism if the user could change the block size on the command line, as the offsets will no longer be correct.
this provides 2 advantages
fragmentation will still exist (the different blocks), but should no longer be framgneted within a block, making deletions block for significantly less time
the ability to evict unused blocks without evicting the whole file.
on the flip side, it will possibly cause significantly more open/closes, as one can't just keep a single file open, for every read() the fs will have to determine the offset, open (or create) the proper cached block file and read (or write) to it as appropriate.
It might also make the write cache harder to implement, as its no longer a single file that has to be synced to the remote.
I'm wondering if most people use it for read caching or write caching or both?