Multiple write streams not causing fragmentation on ZFS?

What is the problem you are having with rclone?

Trying to figure out how rclone manages to avoid fragmentation when doing multiple write streams to a filesystem which does not support sparse/fallocate.

Im copying large files ranging from 1 to 20GB each, from Dropbox remote to local ZFS dataset. ZFS is CoW and does not properly support sparse files, also compression is enabled so doing a preallocation with zeros wont work.

When doing rclone sync with 8 write streams, I can see multiple files being written to the ZFS dataset. When transfer is complete, checking those files with

zdb -ddddd $(df --output=source --type=zfs "file" | tail -n +2) $(stat -c %i "file")

shows a single segment for each file. Fragmented files should be written in multiple segments.

How does this work?

My results look fairly different than:

Run the command 'rclone version' and share the full output of the command.

rclone v1.64.0

  • os/version: debian 12.4 (64 bit)
  • os/kernel: 6.5.0-0.deb12.4-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.1
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Dropbox

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync  -P -c --multi-thread-streams 8 --no-update-modtime --transfers 8 --inplace dbcrypt:test/ /mnt/test/ -vvv

rclone sync  -P -c --multi-thread-streams 0 --no-update-modtime --transfers 1 --inplace dbcrypt:test/ /mnt/test/ -vvv

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[dropbox]
type = dropbox
client_id = XXX
client_secret = XXX
token = XXX
chunk_size = 149M
batch_mode = sync

[dbcrypt]
type = crypt
remote = dropbox:backup
filename_encryption = standard
directory_name_encryption = true
password = XXX
password2 = XXX

And just to clarify; this is not an actual problem, I just find it surprising to work this way, and would like to understand why fragmentation does not happen.

If I understand correctly, rclone uses FALLOC_FL_KEEP_SIZE to create sparses. ZFS has pseudo-support for this:

Since ZFS does COW and snapshows, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Instead, make a best-effort attempt to check that at least enough
space is currently available in the pool (12% margin), then create
a sparse file of the requested size and continue on with life.

I guess my question is more ZFS-specific than rclone.

When doing multi threaded copy/sync/move, rclone does fallocate on local FS, then writes to allocated space in multiple threads while making sure every thread writes to correct position of the allocated space? This should never work with ZFS, since because of CoW's nature all writes should go to the beginning of free space and not overwrite the allocated space but sure does not look like this is the case. Wonder what I'm missing here.

It is interesting question but I think it requires some deep ZFS expertise to answer properly. I do not think rclone does anything special here as it is filesystem agnostic.

I would speculate that maybe as ZFS batches up chunks of data to be written to disk then even multiple streams can be nicely combined together in bigger single transaction group - especially for sync=disabled transactions which is most likely default.

sync=standard is the default, so applications can control sync/async writes by themselves. ZFS aggregates writes in ZIL (ZFS Intent Log) which is flushed to disk every 5 seconds. In my case I'm pulling data from Dropbox at ~110MB/s in 8 threads, I can see multiple files growing in size on ZFS side, during transfer they seem to be fragmented, but once the transfer is done every file contains a single segment.

I don't know anything about zfs, but I do know that if you don't use sparse files with rclone then the OS will zero fill them when rclone seeks beyond the end of the written data.

This isn't very efficient as the files get written twice but the files won't be fragmented.

It may be zfs is very efficient at the zero writing though.

That is my conjecture!