Thanks! That's interesting about rclone as a server.
Follow-up question: When rclone talks to an sftp (or http|webdav|ftp) backend, is that backend doing the hashing of file contents (in order to catch the case of renamed folders without having to copy the entire folder contents over again), or does the client still have to pull the entire target content over (either all up front or as it goes per-file), in order to calculate hashes? I'm not aware of any way that, say for example the gnu sftp server, has any knowledge of, or way to scan and generate server-side, file hashes.
Or is rclone even designed to do that? ("Don't transfer full contents of a folder that has only changed names on the source side, by intelligently comparing checksums generated in parallel on both client and server") It's unclear what --track-renames
flag does. The docs say it handles "renames". But renames of what? Only files? Or also folders high in the tree, with potentially many TB of data underneath? Does it also handle file and folder moves? (Which isn't necessarily the same logic as handling renames.)
Of course, I could set up multiple test scenarios to answer it, but that could easily be way more time-consuming than reading the entire rclone website!
And if not (server doesn't perform checksumming in parallel with client), then the same question for the case where rclone is the backend... Is the server presumably smarter in that regard, and able to calculate checksums independently of the client?
Rsync does exactly that, for example, with the following options:
rclone -a --checksum --delete-after /CLIENT/ hostname:/SERVER/
That creates an exact bit-for-bit, verified-checksum-accurate on both sides, mirror of CLIENT to SERVER. It's very efficient because the client and server both calculate checksums independently. In that way, only new & changed bits need to be transferred. But that completely falls apart if, for example, a CLIENT folder that's high in the tree gets renamed. That could easily (and in my case frequently) results in multiple TBs having to be copied over as "new", when there may actually be zero bits of changed file content, other than a single renamed folder. (Rsync has some options that makes it a tiny bit smarter in recognizing renames, but only in very narrow conditions that don't cover basic things like a high-level folder rename/move. There are also some clever scripting solutions that involve creating a full mirror of hardlinks in a hidden folder on both client and server, which is also not appropriate for my use case.)
What I'm trying to accomplish, is exactly what the rsync "mirror" command does above, but that doesn't copy over potentially many TBs of "new" data just because a folder gets renamed or moved. (Which would turn what should be an operation spanning seconds, into days or weeks; and would also run out of server space before finishing, for example if many TBs of "new" [but actually redundant] files are copied over to a server that has a target goal of approximately constant 75% capacity. And if ZFS dedup
property is turned on to mitigate the intermediate space problem, the copy may now take literally months, at single-digit MB/s write speed.)
Thanks!