blandy
July 12, 2025, 9:55am
1
Issue relates to how rclone seems to write files when multithreading from say FTP to a local HDD. I transferred 753000 files using 1.5TB of space using 4 threads on initial sync.
The result was 78% file fragmentation on the HDD. Seems rclone does not pre-allocate the file data before writing, and instead writes the "chunks" of data, but when multithreading, this will cause fragmentation on NTFS drives.
One issue with pre-allocating is if the data transfer was not complete, but the file size is the same. Only a CRC check would confirm the data, but you can always write to {filename}.partial until download is complete, then rename. You can also purge rouge .partial files if rclone exited unexpectedly. I've done similar with my projects.
rclone v1.69.3
os/version: Microsoft Windows 11 Pro 24H2 24H2 (64 bit)
os/kernel: 10.0.26100.4652 (x86_64)
os/type: windows
os/arch: amd64
go/version: go1.24.3
go/linking: static
go/tags: cmount
Example of command used:
rclone sync ftpserver:/Websites/the-website "D:/Backups/Websites/the-website" --progress --fast-list --use-server-modtime
blandy:
rclone v1.69.3
This is an old rclone (4 releases behind). If you suspect some bugs please test with the latest rclone version.
Nobody is interested in going through different versions releases notes to check if maybe something is already fixed or not.
Or in other words reporting bugs in old rclone versions is waste of time. Yours and people reading it,
blandy
July 12, 2025, 10:05am
3
Ah crap! I thought I was. I'll update and test again.
1 Like
Have a look at this old issue:
opened 11:34AM - 18 May 20 UTC
enhancement
OS: Windows
LocalFS
> I was assuming that if you are writing at 4 points in the file then you'll be … making 4 fragments, but it seems I was wrong about that!
Yeah, not on Windows, unless you're dealing with sparse files. For normal files, Windows just "extends" the initialized portion of the file to that point, which can take forever if you're writing at offset 25GB and none of the file has been initialized (regardless of whether it has been allocated).
I thought I'd share a couple of experiments to illustrate what happens in reality with sparse files. Here's what happens on Windows. Note that you can see the fragmentation via `fsutil file queryExtents`, but there's also [`contig -a`](https://docs.microsoft.com/en-us/sysinternals/downloads/contig) which is similar.
```
C:\>fsutil file createNew temp 0 && fsutil sparse setFlag temp 1 && fsutil file setEOF temp 134217728 && (echo.>>temp) && fsutil file setEOF temp 268435456 && (echo.>>temp) && fsutil file queryExtents temp && del temp
File C:\temp is created
File C:\temp eof set
File C:\temp eof set
VCN: 0x0 Clusters: 0x8000 LCN: 0xffffffffffffffff
VCN: 0x8000 Clusters: 0x10 LCN: 0xa4b91e
VCN: 0x8010 Clusters: 0x7ff0 LCN: 0xffffffffffffffff
VCN: 0x10000 Clusters: 0x10 LCN: 0xa4cb52
```
LCN is the logical cluster number (i.e. the block number relative to the beginning of the volume), and VCN is the virtual cluster number (the block number relative to the beginning of the file).
Notice that there are 0x7ff0 clusters between the two allocated extents virtually, but only 0xa4cb52 - (0xa4b91e + 0x10) = 0x1224 clusters allocated to the file on the volume. This means the file will never end up contiguous. Extents can even end up out-of-order as a result.
Here's what the equivalent would look like on Linux:
```
$ fallocate -l 134217728 temp && fallocate -p -l 134217728 temp && (echo "">>temp) && fallocate -o 134217729 -l 134217728 temp && fallocate -p -o 134217729 -l 134217728 temp && (echo "">>temp) && sudo hdparm --fibmap temp && rm -f temp
temp:
filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
134217728 195430400 195430407 8
268435456 196298752 196298759 8
```
It's a similar story here. There are (196298752 - 195430407) * 4096 = 3556741120 bytes between the FS blocks, but only 268435456 - (134217728 + 1) = 134217727 in the file. Again, the file cannot be contiguous.
So we conclude sparse files have fragmentation problems.
Now, the nice thing about `fallocate` (at least on ext4) is that it seems to support blocks that are *allocated yet uninitialized*. And the system will return zeros if you try to read such blocks. I didn't use this feature, because I used "hole punching" to mimic the NTFS behavior. But you _are_ using this feature, so you should be fine on Linux.
NTFS doesn't quite support this feature though. What it *does* support is something Windows calls a "valid data length", which is the length of the file (starting from offset 0) which is assumed to contain initialized data on the disk. On Windows, the "fallocate" method of aria2c *directly sets the valid data length* ([`SetFileValidData`](https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilevaliddata)), which is like setting the file length while taking whatever is on the disk as the file contents.
What this means for you is that you have the following options:
- To minimize fragmentation *and* initialization time, **you want to call `SetFileValidData`**, and I think this needs to be _after_ setting `FileAllocationInformation` rather than before (but I'm not sure). However, this requires `SeManageVolumePrivilege`, since it can leak underlying disk contents. I think you need to be an administrator and **also** request this privilege explicitly. The easiest way is to call [`RtlAdjustPrivilege(28 /*SeManageVolumePrivilege*/, TRUE, FALSE, &wasEnabled)`](https://source.winehq.org/WineAPI/RtlAdjustPrivilege.html) from `ntdll.dll` at program startup time. If the call succeeds (returns zero), then you know you can call `SetFileValidData` and don't need to do anything else. If it fails, and the user has passed `--file-allocation=falloc`, then I suggest aborting: they've probably forgotten to use "Run As Administrator", and you want to let them know. If they haven't specified anything, however, then you probably want to be "smart" and try the other options below.
- You can just keep going and make the user wait for initialization during download. It's what download programs typically do. It avoids fragmentation as much as possible and it works on all file systems. But the waiting time can be prohibitive for a huge file. If you do this, you can also consider temporarily [lowering the I/O priority](https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_file_io_priority_hint_information) to `IoPriorityLow`, and restoring it to `IoPriorityNormal` before actually downloading the file. Or you can provide a flag to the user to do this manually if they're interested. This hopefully ensures other activity doesn't grind to a halt while the file is getting initialized. But I have not tested this, so I'm not sure.
- You can use the buffering technique I mentioned earlier (with, say, 64MB or 256MB or even 1GB chunks). This allows for a bit of fragmentation, but with very large fragments, so it's unlikely to be a problem for anybody. I think this is the smartest solution if `SetFileValidData` fails, but I can't speak to the charge/ban issue. You'll need to figure out (possibly on a case-by-case basis) if this solution makes sense for the given remote. I suspect if your chunks are large (say, ≥ 256MB?) it shouldn't be a problem, but I don't really know.
Hope this clarified everything you were asking about!
_Originally posted by @mehrdadn in https://github.com/rclone/rclone/issues/2469#issuecomment-630029285_
It seems it is still something waiting to be fixed.
blandy
July 12, 2025, 2:39pm
5
Yes, seems so thanks. Can you remove my post please? No point it being on here. Thanks for the info, appreciated.
1 Like
system
(system)
Closed
August 11, 2025, 2:40pm
6
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.