Trying to figure out how rclone manages to avoid fragmentation when doing multiple write streams to a filesystem which does not support sparse/fallocate.
Im copying large files ranging from 1 to 20GB each, from Dropbox remote to local ZFS dataset. ZFS is CoW and does not properly support sparse files, also compression is enabled so doing a preallocation with zeros wont work.
When doing rclone sync with 8 write streams, I can see multiple files being written to the ZFS dataset. When transfer is complete, checking those files with
And just to clarify; this is not an actual problem, I just find it surprising to work this way, and would like to understand why fragmentation does not happen.
If I understand correctly, rclone uses FALLOC_FL_KEEP_SIZE to create sparses. ZFS has pseudo-support for this:
Since ZFS does COW and snapshows, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Instead, make a best-effort attempt to check that at least enough
space is currently available in the pool (12% margin), then create
a sparse file of the requested size and continue on with life.
I guess my question is more ZFS-specific than rclone.
When doing multi threaded copy/sync/move, rclone does fallocate on local FS, then writes to allocated space in multiple threads while making sure every thread writes to correct position of the allocated space? This should never work with ZFS, since because of CoW's nature all writes should go to the beginning of free space and not overwrite the allocated space but sure does not look like this is the case. Wonder what I'm missing here.
It is interesting question but I think it requires some deep ZFS expertise to answer properly. I do not think rclone does anything special here as it is filesystem agnostic.
I would speculate that maybe as ZFS batches up chunks of data to be written to disk then even multiple streams can be nicely combined together in bigger single transaction group - especially for sync=disabled transactions which is most likely default.
sync=standard is the default, so applications can control sync/async writes by themselves. ZFS aggregates writes in ZIL (ZFS Intent Log) which is flushed to disk every 5 seconds. In my case I'm pulling data from Dropbox at ~110MB/s in 8 threads, I can see multiple files growing in size on ZFS side, during transfer they seem to be fragmented, but once the transfer is done every file contains a single segment.
I don't know anything about zfs, but I do know that if you don't use sparse files with rclone then the OS will zero fill them when rclone seeks beyond the end of the written data.
This isn't very efficient as the files get written twice but the files won't be fragmented.
It may be zfs is very efficient at the zero writing though.