I'd like to request that
rclone check's --download flag be made available to the checksum validation process for
rclone copy and
The basic problem I'm trying to solve is a multi-stage synchronization with an SFTP server in the middle that does not permit shell logins (so no checksum support). Something like source --> server --> many remote nodes. The remote nodes use date+size to identify changed files.
Unfortunately, the source in my case is a CI system that creates fresh copies of the various files, consequently with different created/last modified times, even if the contents are identical to the previous ones. File size alone isn't enough to ensure that the contents haven't changed. We don't want to just blindly re-upload, though, since that will change the modification time on the server; and cause all of the many remote nodes to perform unnecessary copies.
Fortunately for me, the source-->server link is on a fast network, so we can tolerate the cost of running
rclone check --download to identify changed files without server-side hashing support. Unfortunately I end up having to run a multi-stage process of
rclone check, followed by separate
rclone delete commands with explicit file lists. It would be very convenient if the comparison operations performed by
rclone sync could use all the features of
Until such functionality exist you could make your life much easier by utilising hasher overlay.
Thanks for the suggestion! I really appreciate that folks from the community step up with deeper knowledge of the tool.
For my use case, I think the hasher overlay wouldn't help much.
rclone check can skip the hash check on size differences, where synchronizing the hasher overlay with
rclone hashsum would end up downloading everything and hashing anyways. From my read of the documentation, it doesn't look like the hasher overlay will transparently download to check hashes during
sync if they aren't available in the database -- but maybe I'm wrong about that? Alas I cannot rely on a persistent local cache, as my use case is for CI scripts which could be picked up and run on any random host in a fleet, and also I can't necessarily trust that the remote end hasn't been inadvertently mucked up by someone else (at least not yet).
chunker overlay... in a bit creative way to store files' hashes together with files on your remote.
type = chunker
remote = SFTP_remote:
and interact with your SFTP server only using
It can. The trick is to set
--hasher-auto-size to a very large value -- larger than your largest file.
Hasher can do this too, when using
--checksum. You don't need to run
hashsum first if you're using the
This is very clever trick! Thx for sharing.
Also, may not be a fit for your use case, but the latest beta of
bisync has this feature.
Thank you! Maybe the hasher documentation could be updated to clarify this? As I read it, the Other operations section implied that the hash database will only be updated if a full transfer was explicitly requested, particularly as the
hashsum command description explicitly documents how it uses
auto_size but it is not discussed for other operations. Was the "other operations" section meant to be taken as a superset of the
Documentation in open source project.... never ending story and always open for improvements:) Feel free to contribute:)
I agree -- the documentation is misleading on this point. IMO, it is actually clearer in the code than the docs: