Creating hashfile while copy

Hi everyone,

wondering if it's possible to output a hashfile during rclone copy, which I can use to check file integrity later on.
My research so far has shown that I can only use a separate run with rclone hashfile after the initial copy, but maybe I missed something.

why I want this?
I'd need to do a verificaytion of the copy process, to be sure nothing got corrupted during copy. Behaviour from other secure copy tools would be: copy -> dump hashes into a hashfile during copy -> do a second read of the destination files and compare with hashfile to make sure no bit has flipped on the way.
Doing it with a second sync run would be possible, but require a second read process (rclone sync --checksum) which I'd like to avoid as time is crucial for my usecase.

hello and welcome to the forum,

--- rclone does that duing the copy process.

--- there there is rclone check, rclone checksum, rclone lsf, etc..

--- there a post in the forum about generating md5sum file during rclone copy
it uses a simple bash script.

You need some logging to see the hashfile.

felix@gemini:~$ rclone copy /etc/hosts GD: -vv
2022/05/27 12:07:36 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2022/05/27 12:07:36 DEBUG : rclone: Version "v1.58.1" starting with parameters ["rclone" "copy" "/etc/hosts" "GD:" "-vv"]
2022/05/27 12:07:36 DEBUG : Creating backend with remote "/etc/hosts"
2022/05/27 12:07:36 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2022/05/27 12:07:36 DEBUG : fs cache: adding new entry for parent of "/etc/hosts", "/etc"
2022/05/27 12:07:36 DEBUG : Creating backend with remote "GD:"
2022/05/27 12:07:36 DEBUG : GD: Loaded invalid token from config file - ignoring
2022/05/27 12:07:36 DEBUG : Saving config "token" in section "GD" of the config file
2022/05/27 12:07:36 DEBUG : Failed to keep previous owner of config file: chown /opt/rclone/rclone.conf500790128: operation not permitted
2022/05/27 12:07:36 DEBUG : GD: Saved new token in config file
2022/05/27 12:07:36 DEBUG : Google drive root '': 'root_folder_id = 0AGoj85v3xeadUk9PVA' - save this in the config to speed up startup
2022/05/27 12:07:37 DEBUG : hosts: Sizes differ (src 278 vs dst 236)
2022/05/27 12:07:37 DEBUG : hosts: md5 = 44f74a6bbe47bcbe3a18ef0893ee27dc OK
2022/05/27 12:07:37 INFO  : hosts: Copied (replaced existing)
2022/05/27 12:07:37 INFO  :
Transferred:   	        278 B / 278 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         1.3s

and you can always pull the md5um

felix@gemini:~$ rclone md5sum GD:hosts
44f74a6bbe47bcbe3a18ef0893ee27dc  hosts

You can use the hasher backend for doing this I think.

Thanks for your feedback. Will definately have a look at hasher. Also the hashes in logging have been on my mind, too. I guess the bash scrip @asdffdsa mentioned is using that. Will try to find it and see if it fits my needs.

I have to admit I don't have code-level knowledge about rclone so the following is straight speculative.
To my understanding rclone is using checksums only for comparing files with same mtime and size, to see if it should overwrite them. Also I'm using rclone for nfs/smb/usb to nfs/smb/usb/tb, so only local copies concerning rclone mechanisms. Maybe cloud services do have integrated hash-check mechanisms te deal with lossy WAN and rclone is using them, but in my case they wouldn't apply.
I'd guess in case of a fresh, local copy rclone will trigger the write and trust the filesystem/os to do it properly. Would actually be great if there's a re-check of the actually written bits. If so I'be keen to get to know more about how it works.

My experience is that sometimes drives/controllers/networks/filesystems or bad karma and especially combinations of them tend to fuck things up some times. I've seen quite some bit flips with different copy mechanisms by now and some without any signs of it in any logging.
Ok, maybe I'm a bit paranoid about it by know. Having a backchecking mechanism makes me sleep better =)

Checking is good :slight_smile:

What most rclone users do is do the rclone sync then do an rclone check or if really paranoid rclone check --download.

You can then use rclone md5sum to save a hash file. You can feed this back into rclone md5sum -C to check the hashes also, both locally and remotely.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.