Thanks for the reply.
I've read literally every line of the documentation and every page of website and github, and hours of forum posts. I did see the reference to the --track-renames
flag, but it's very unclear in the documentation (and other references) if:
- ...it also "tracks" moves (within the same tree being rclone'd)
- ...how it "tracks" renames (and/or moves); e.g. does it use some kind of local watcher (seems unlikely)? Or by comparing filesize + [content checksum] on both client and server?
What I'm trying to accomplish, is an exact mirror of local "client" to remote (LAN) "server" exactly as would be accomplished by:
rclone -a --checksum --delete-after /CLIENT/ hostname:/SERVER/
What that does:
- My primary objective:
- Makes a bit-for-bit mirror of source, on dest.
- Nice bonuses but not the primary goal:
- Does so in an incremental, differential way--only transferring bits that are new.
- Huge limitation of rsync (but not really its purpose anyway in spite of such a feature being requested constantly), which is why I'm investigating a few options, most promisingly
rclone
:
- Do not transfer "new" bits just because a file within the source and target tree changed names or moved to a different folder. This could cause a routine "mirror", which normally takes seconds to minutes, to suddenly consume several days, just because the user was unfortunate enough to rename a folder high in the tree, with many TBs of data underneath.
Rsync accomplishes this, by connecting a local client invocation of rsync, directly to the specified server's rsync daemon, via SSH. With those options in place, the client side scans its file contents and generates hashes, while the server side does the same. If filesizes are the same, rsync looks at the hashes to see if they are bit-for-bit identical, or what [potentially small] parts [of a potentially huge file] it needs to send. The entire contents of the target are not required to be pulled over the network so that the client can generate hashes from its contents. In that way (assuming no file/folder moves/renames), it's extremely efficient. (It could be made more efficient with, say, caching of checksums, sizes, and mtimes to local DBs on both sides, and/or caching via xattrs as some other linux file utilities do [e.g. rmlint], but still - it's a far cry better than blindly re-copying everything each time, scanning server files from the client first, or only comparing timestamp and filesize.
Rsync can actually do what I'm looking for, with appropriate flags, but only under extremely narrow conditions that rarely happens in real life. (At least my [real?] life.) It can't handle, for example, a simple rename of a folder high in the tree.
Most backup products (e.g. duplicacy, borg, crashplan, carbonite, etc.) do that last point (hash-based comparison), but they are "chunk-based"/checksumming solutions that usually don't produce directly browseable targets. There are also some other "sync"-oriented products that do the last bulletpoint (tolerant of file & folder renames & moves, based on size & content hashes), but are low on my list for other reasons; such as Syncthing and Resilio Connect (primarily two-way torrent-based sync), FreeFileSync (GUI), etc.
Btrfs and ZFS also can both accomplish this, at least as an end-result. But are not appropriate solutions in this case (in spite of the server being ZFS) because:
- Both filesystems on the server require the entire entire "new" contents (of a simple folder rename) to be transferred, before deduplicating server-side. (Again, potentially consuming days for what might be zero bytes of actual changed data besides a folder rename.) This alone is reason enough to immediately discount them from the running, the next two points are just ancillary. (If both filesystems were ZFS, a
zfs send
command could do the trick...but ironically without any checksum verification between sender and receiver, a hiccup in the middle kills the whole thing, and my sources aren't ZFS.)
- Btrfs requires a third-party offline deduper (such as rmlint) to be run after-the fact. But if the renamed folder was high in the tree, you may have already run out of disk space during the rsync. (And when syncing multi TBs on a home IT budget, very likely have.)
- ZFS requires the native inline
dedup
property to be turned on, which drops write speeds to single-digit MB/s if the target has many TBs of hash checksums to grind through for every new block, and is well-understood to require more RAM to cache hashes than all but enterprise users with unlimited budgets can afford. (64GB RAM for 7TB of data isn't remotely enough. The documented "1GB RAM for each TB of data" is known [and exactly why] to be grossly miscalculated.)
My server, as mentioned, is another local linux box via Lan. Connection method is whatever would work best for doing this, presumably SFTP, or, as ncw mentioned on another post, rclone itself via rclone serve sftp/http/webdav/ftp
. I imagine I may have to try multiple options.