I'd like to request that rclone check's --download flag be made available to the checksum validation process for rclone copy and rclone sync.
The basic problem I'm trying to solve is a multi-stage synchronization with an SFTP server in the middle that does not permit shell logins (so no checksum support). Something like source --> server --> many remote nodes. The remote nodes use date+size to identify changed files.
Unfortunately, the source in my case is a CI system that creates fresh copies of the various files, consequently with different created/last modified times, even if the contents are identical to the previous ones. File size alone isn't enough to ensure that the contents haven't changed. We don't want to just blindly re-upload, though, since that will change the modification time on the server; and cause all of the many remote nodes to perform unnecessary copies.
Fortunately for me, the source-->server link is on a fast network, so we can tolerate the cost of running rclone check --download to identify changed files without server-side hashing support. Unfortunately I end up having to run a multi-stage process of rclone check, followed by separate rclone copy, rclone delete commands with explicit file lists. It would be very convenient if the comparison operations performed by rclone sync could use all the features of rclone check.
Thanks for the suggestion! I really appreciate that folks from the community step up with deeper knowledge of the tool.
For my use case, I think the hasher overlay wouldn't help much. rclone check can skip the hash check on size differences, where synchronizing the hasher overlay with rclone hashsum would end up downloading everything and hashing anyways. From my read of the documentation, it doesn't look like the hasher overlay will transparently download to check hashes during copy or sync if they aren't available in the database -- but maybe I'm wrong about that? Alas I cannot rely on a persistent local cache, as my use case is for CI scripts which could be picked up and run on any random host in a fleet, and also I can't necessarily trust that the remote end hasn't been inadvertently mucked up by someone else (at least not yet).
Thank you! Maybe the hasher documentation could be updated to clarify this? As I read it, the Other operations section implied that the hash database will only be updated if a full transfer was explicitly requested, particularly as the hashsum command description explicitly documents how it uses auto_size but it is not discussed for other operations. Was the "other operations" section meant to be taken as a superset of the hashsum behaviour?