Does rclone re-send previously sent files, if source files are renamed and/or moved?

jim-collier · September 10, 2019, 9:35am

Again, seems like another simple question that should have an obvious answer, that I can't seem to find a definitive answer for.

Assuming that the file operations are done within the source tree that rclone is copying, if a source file or folder is renamed and/or moved, are the file or folder contents copied all over again to the target, then the old target locations/files deleted? Or is it smart enough to know (e.g. by comparing file content hashes) that the files are the same, and move/rename the old target files/folders to match?

There are a few sync utilities that do this (including a couple of bittorrent-based sync utils and a GUI-only utility named "FreeFileSync").

Rsync can't really do this, at least not very well, especially for moves or directory renames. It would seem that this could be one of rclone's differentiating features, but I've only found vague references to renamed files, and references to hashes.

What is your rclone version (output from `rclone version`)

(latest)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 18.04 x64

Which cloud storage system are you using? (eg Google Drive)

Local Linux server, via ssh/sftp

Animosity022 · September 10, 2019, 10:37am

Can you provide a specific example of what you are asking and what remotes you are using? The answer isn't straight forward.

If you run a copy command, it can compare on size / checksums / time or however you want to configure it but it is also dependent on what backend you are using and connecting to as not every backend supports all the same things.

rclone can track-renames as well if you use that flag.

Sync is written up here:

https://rclone.org/commands/rclone_sync/

Copy is written up here:

https://rclone.org/commands/rclone_copy/

rclone is like a huge swiss army knife so the question really is, what is your use case and what are you trying to do and we can help answer that.

jim-collier · September 10, 2019, 6:39pm

Thanks for the reply.

I've read literally every line of the documentation and every page of website and github, and hours of forum posts. I did see the reference to the --track-renames flag, but it's very unclear in the documentation (and other references) if:

...it also "tracks" moves (within the same tree being rclone'd)
...how it "tracks" renames (and/or moves); e.g. does it use some kind of local watcher (seems unlikely)? Or by comparing filesize + [content checksum] on both client and server?

What I'm trying to accomplish, is an exact mirror of local "client" to remote (LAN) "server" exactly as would be accomplished by:

rclone -a --checksum --delete-after /CLIENT/ hostname:/SERVER/

What that does:

My primary objective:
- Makes a bit-for-bit mirror of source, on dest.
Nice bonuses but not the primary goal:
- Does so in an incremental, differential way--only transferring bits that are new.
Huge limitation of rsync (but not really its purpose anyway in spite of such a feature being requested constantly), which is why I'm investigating a few options, most promisingly rclone:
- Do not transfer "new" bits just because a file within the source and target tree changed names or moved to a different folder. This could cause a routine "mirror", which normally takes seconds to minutes, to suddenly consume several days, just because the user was unfortunate enough to rename a folder high in the tree, with many TBs of data underneath.

Rsync accomplishes this, by connecting a local client invocation of rsync, directly to the specified server's rsync daemon, via SSH. With those options in place, the client side scans its file contents and generates hashes, while the server side does the same. If filesizes are the same, rsync looks at the hashes to see if they are bit-for-bit identical, or what [potentially small] parts [of a potentially huge file] it needs to send. The entire contents of the target are not required to be pulled over the network so that the client can generate hashes from its contents. In that way (assuming no file/folder moves/renames), it's extremely efficient. (It could be made more efficient with, say, caching of checksums, sizes, and mtimes to local DBs on both sides, and/or caching via xattrs as some other linux file utilities do [e.g. rmlint], but still - it's a far cry better than blindly re-copying everything each time, scanning server files from the client first, or only comparing timestamp and filesize.

Rsync can actually do what I'm looking for, with appropriate flags, but only under extremely narrow conditions that rarely happens in real life. (At least my [real?] life.) It can't handle, for example, a simple rename of a folder high in the tree.

Most backup products (e.g. duplicacy, borg, crashplan, carbonite, etc.) do that last point (hash-based comparison), but they are "chunk-based"/checksumming solutions that usually don't produce directly browseable targets. There are also some other "sync"-oriented products that do the last bulletpoint (tolerant of file & folder renames & moves, based on size & content hashes), but are low on my list for other reasons; such as Syncthing and Resilio Connect (primarily two-way torrent-based sync), FreeFileSync (GUI), etc.

Btrfs and ZFS also can both accomplish this, at least as an end-result. But are not appropriate solutions in this case (in spite of the server being ZFS) because:

Both filesystems on the server require the entire entire "new" contents (of a simple folder rename) to be transferred, before deduplicating server-side. (Again, potentially consuming days for what might be zero bytes of actual changed data besides a folder rename.) This alone is reason enough to immediately discount them from the running, the next two points are just ancillary. (If both filesystems were ZFS, a zfs send command could do the trick...but ironically without any checksum verification between sender and receiver, a hiccup in the middle kills the whole thing, and my sources aren't ZFS.)
Btrfs requires a third-party offline deduper (such as rmlint) to be run after-the fact. But if the renamed folder was high in the tree, you may have already run out of disk space during the rsync. (And when syncing multi TBs on a home IT budget, very likely have.)
ZFS requires the native inline dedup property to be turned on, which drops write speeds to single-digit MB/s if the target has many TBs of hash checksums to grind through for every new block, and is well-understood to require more RAM to cache hashes than all but enterprise users with unlimited budgets can afford. (64GB RAM for 7TB of data isn't remotely enough. The documented "1GB RAM for each TB of data" is known [and exactly why] to be grossly miscalculated.)

My server, as mentioned, is another local linux box via Lan. Connection method is whatever would work best for doing this, presumably SFTP, or, as ncw mentioned on another post, rclone itself via rclone serve sftp/http/webdav/ftp. I imagine I may have to try multiple options.

ncw · September 10, 2019, 9:03pm

--track-renames only works for sync.

It tracks moves within the tree being rcloned.

It uses file size + checksum

Rclone will copy a whole file even if only a bit of it has changed (unlike rsync) but other than that, that is correct.

This is exactly what --track-renames is for.

I think rclone will do what you want here!

Note that rclone doesn't cache checksums so using the --checksum flag can take some time. If you leave it out then rclone will use modtime+size to detect modifications (which is how rsync works by default too). Rclone will check hashes on transfer and when doing --track-renames regardless.

jim-collier · September 10, 2019, 9:21pm

Awesome, thanks! Indeed, exactly what I'm looking for. And thank you for answering all three of my headscratchers so succinctly and directly.

Does rclone re-send previously sent files, if source files are renamed and/or moved?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using? (eg Google Drive)

What is your rclone version (output from `rclone version`)