Hash on the fly with rclone?

Every 5 years I copy ~15TB of data from one set of HDD to a fresh set of diversified but newer HDDs. This is source of extreme stress and anxiety but I have developed a set of scripts to reduce that.

Now in 2021, it's time again to do this for my 20 TB of data from HDDs I purchased in 2015. I would like to use rclone for this.

What I will do is copy the files from the old HDDs (OLD) to these new HDDs (NEW). My question is:

  1. While rclone copies from OLD to NEW, is there anyway I can also have it generate MD5 or SHA hashes and store those in a file (since it has already read the data from OLD)?
  2. The objective is to have a hashlist generated on the fly with rclone so that I don't have to do a second pass after the copy step, just to generate a hashlist
  3. If this feature is not yet available - what will be involved to add this feature for local FS (local HDDs) copies only? Would I have to write plugins in Go or do I have to update rclone itself (i.e. rclone does not support external plugins?)

hi,

there might be another way, but this will work.

use a debug log and for each file copied, the output will be
2021/10/19 15:27:40 DEBUG : file.txt: md5 = c4ca4238a0b923820dcc509a6f75849b OK
then iterate thru the log file and regex the needed info.
in python, .*DEBUG : (.*): md5 = (.*) OK$ would match

file.txt
c4ca4238a0b923820dcc509a6f75849b
1 Like

@asdffdsa, really appreciate this! This is one way of getting the MD5 hash

2 questions:

  1. Is there a way to obtain SHA instead (or in addition to the MD5 hash)?
  2. That MD5 hash is of the file read from the source instead of the file written out, right? This would make sense instead of any other way, but just confirming.

Again, this response is very useful.

  1. not that i know of. rclone copy local to local uses md5
    for sha1 - rclone sha1sum, which as i understand, just calls rclone hashsum sha1
  2. the hash is from the source.
1 Like

I think the not quite merged yet hasher backend would help with this. Its job is to cache hashes so they don't get re-computed unecessarily.

@ivandeex will be along in a moment to explain more I'm sure :slight_smile:

1 Like

After refactoring his main job is to bring free bolt to her majesty vfs :slight_smile:

Hash caching is just coincident byproduct

2 Likes

I really appreciate this response. BY the way, this is not really my request. This idea was proposed years ago:

@thestigma mentioned this at When does rclone compute a file hash? - #2 by thestigma

Any time you already have to read the entire file anyway though - calculating the hash can be done almost for free (such as for the upload of a file). The only extra cost is a fairly trivial amount of CPU, so I think rclone generally tends to do this to verify a successful transfer.

… but I didn't see any followups on this matter.

I have also asked about this for ref: https://www.reddit.com/r/rclone/comments/qbgd9x/hash_on_the_fly_with_rclone/

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.