Binary diff support for SFTP?

SFTP seems like kind of a “special case” storage backend, and I was wondering:

i) Would it be possible to support binary diff transfers for SFTP while preserving the 1-to-1 object mapping? If so, would this also be possible while using crypt?

ii) Are there any plans to add support for this? I realise it may be kind of a niche feature given that cloud provider storage backends seem the most common use case. But it’s a feature that I personally would love. I realize one answer may be “just use rsync”, but rclone’s interface and the crypt feature are brilliant, and superior to rsync in my opinion

I’ll post the FAQ on binary diffs below for the sake of clarity

Why doesn’t rclone support partial transfers / binary diffs like rsync?

Rclone stores each file you transfer as a native object on the remote cloud storage system. This means that you can see the files you upload as expected using alternative access methods (eg using the Google Drive web interface). There is a 1:1 mapping between files on your hard disk and objects created in the cloud storage system.

Cloud storage systems (at least none I’ve come across yet) don’t support partially uploading an object. You can’t take an existing object, and change some bytes in the middle of it.

It would be possible to make a sync system which stored binary diffs instead of whole objects like rclone does, but that would break the 1:1 mapping of files on your hard disk to objects in the remote cloud storage system.

All the cloud storage systems support partial downloads of content, so it would be possible to make partial downloads work. However to make this work efficiently this would require storing a significant amount of metadata, which breaks the desired 1:1 mapping of files to objects.

After doing some experimenting with running crypt on large files and looking for diffs with rsync, it seems like when using crypt, the encrypted file will completely change every time there is any change, no matter how small, in the plaintext file (i.e. the file isn’t chunked before encryption, if I understand correctly)

I guess that makes whole idea of binary diffs out of the question when using crypt, even if the underlying storage mechanism supports partial uploads / changing arbitrary bytes in a stored file?

^^^ Edit: I’ve actually realised none of this is accurate because my testing with rsync was mostly local, and rsync doesn’t use delta transfers for local transfers even if you tell it to

^^^ Edit2: After some more testing remotely, my original suspicion is true. When using crypt, rsync copies the entire file on any change rather than sending a delta. So I guess that binary diff support while using crypt is out of the question. I still think that it would be possible for rclone to support binary diff for sftp if not using crypt, but it’s such a niche feature that I don’t imagine it is a high priority

I’m going to bump this one time in a hope to learn more about how crypt works.

Specifically, say that we set up a crypt remote, then pass it a very large file - let’s say 10GiB.

Then let’s say we make a tiny change to that file - perhaps we alter one byte.

Then we pass the changed file to the crypt remote with a new name, so that the remote has two encrypted copies of the file, one before the change and one after the change.

Will those two encrypted files be similar in any way? Would we expected most chunks in those files to be identical, with only one chunk being altered, or would we expect the files to have no similarity at all?

Thanks!

rclone encrypts in blocks of 64k, so in theory only 1 64k block need change.

However rclone must change the nonce (an encryption parameter) for each upload - it is insecure not to. So in practice you’ll find the whole file will be completely different.

Ah! That clears it up. Thanks v much for helping me understand

(edit: also, thank you for all your work on rclone, I find it extremely useful!)

1 Like