Does `--size-only` account for local sparse files properly?

What is the problem you are having with rclone?

It's a question: Does --size-only account for (rclone created) sparse-files?

If you sync from, say, S3 to local, rclone does sparse files for multi-part downloads. Depending on how sparse-files are stated, they may have their true size or their final size. If it is the final size, then sync --size-only will get false negatives for needing to sync

Run the command 'rclone version' and share the full output of the command.

rclone v1.58.1
- os/version: darwin 10.15.7 (64 bit)
- os/kernel: 19.6.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.17.9
- go/linking: dynamic
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

S3 to local but really anything-but-local to local

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy --size-only s3:<bucket> /local/path

The rclone config contents with secrets removed.

N/A

A log from the command with the -vv flag

N/A

In an attempt to answer my own question, I made a large (969932800 byte) file on a remote server and did a download to my local machine and cut it off.

I CTRL+C the transfer with

Transferred:   	   55.562 MiB / 925 MiB, 6%, 10.700 MiB/s, ETA 1m21s
Transferred:            0 / 1, 0%
Elapsed time:         8.0s

and then looked at the local file to see sizes

command size
ls -l 893943808
rclone size 893943808
rclone lsjon 893943808
du -k 76768*1024 = 78610432

(The last one uses -k to make 1024 byte blocks (as opposed to the default 512) and then converted)

So the file is about 74.9688mb as per the file system.

Rerunning

$ rclone -vv copy --progress --size-only --dry-run myremote:tmp/ .

shows

2022-05-29 15:42:10 DEBUG : big.dat: Sizes differ (src 969932800 vs dst 893943808)

Which means that rclone is also getting confused by itself.

Does this mean that a --size-only with a local may be tricked if it is still sparse but sparse to the full size? Imagine local to crypt(webdav) where size is all you can do! It is very possible.

is this a bug?

The size used in --size-only is the size of the file as you might see in ls -l. This will always be the same regardless of whether the file contains sparse blocks or not.

When rclone is using multithread streaming it will download to different parts of the file at once, and this will make a sparse file on most OSes. The size as seen by ls -l will be the size of the final transfer, so if rclone is using 4 transfers it will be how far the 4th transfer has got, starting from 3/4 of the way through the file.

Yes.

In general --size-only is a pretty unreliable check and should only be used if you haven't got anything better. Rclone's default sync is size+modtime which is fast and reliable, or you can use --checksum which gives you size+hash which is slower but very accurate.

That is unfortunate as it is a very real possibility then to have unknown data loss.

It may not be worth the effort but one option would be to always write with one extra byte and then truncate when complete. That would make sure that it can't be fooled.

I get that but using something with crypt and webdav (or even just webdav) you don't have much of an option. Also, it's been my experience that nearly all file changes affect the size. Obviously it is trivial to create a counter-case but in general, realistic usage, this is what I'd expect. And an alarming number of remotes support neither hashes nor ModTime.

That is a nice idea. Easier might be to withhold the last byte until all the other writes have completed in the multipart stream.

Fancy making an issue about that?

I wasn't sure since then you need to book-keep the last byte as opposed to a truncate call. But each would work.

Done: Make rclone more robust to false-negatives with local copies and --size-only · Issue #6206 · rclone/rclone · GitHub

Thanks for the help.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.