Write hash sums in metadata to new files which rclone can read

Hi,

What is the best way to calculate and write hash sums in metadata to files which Rclone can read to prevent it from creating new files when using example rclone copy to a crypt volume at example Jottacloud which require the MD5 value?

My goal is to make rclone copy skip having to copy the file to a TMPDIR folder before upload for calculation of MD5 to relax the CPU when using Rclone. So, I'm looking for a way to save the calculated hash (both MD5 and SHA1) to the metadata of a new local copy of the file (the old file will be deleted) which Rclone can read. This way rclone can easily read both MD5 and SHA1 hash in the metadata of a file for comparison.

The crypt remotes don't support a hash sum as it always changes once it's uploaded because of the encryption.

In short, you can't run a hash checksum from a non encrypted file to a crypted remote.

@Animosity022, I'm thinking about the step before I upload to the encrypted drive. From https://rclone.org/jottacloud/#modified-time-and-hashes

Note that Jottacloud requires the MD5 hash before upload so if the source does not have an MD5 checksum then the file will be cached temporarily on disk (wherever the TMPDIR environment variable points to) before it is uploaded.

So, I interpret the following to that if rclone copy can read the MD5 hash before upload, one can skip the step to cache the file temporarily on disk to calculate the MD5 hash before it is uploaded. In that regard, I'm asking for a way to save the hash sum of every file I have in the file's metadata which rclone can read.

The only simple solution I know of is the usage of Extended File Attributes (xattr) with a simple command such as for file do xattr -w filehash "$(md5 -q "$file")" "$file" or something.

I can't quite follow why you'd want to save it as it provides no value on an encrypted remote since the file hash on the remote will always be different.

What's your flow that makes this valuable to keep somewhere else?

The usual case of the m5sum is to compare two files together as it's stored by the provider on the other side.

So if you have file A local and try to compare that to file A on an encrypted jottacloud remote, they will not match.

There are several reasons. The first is to prevent rclone from both copy and calculate MD5 values while uploading to the crypt. Since I'm using a low-powered Arm server, the speed are much lower when both tasks have to be done at the same time. Ideally, if Jottacloud would not require MD5 values prior upload I wouldn't have looked for a solution like this. Secondly, if I need to upload the file again later to either Jottacloud or another cloud service, I still have the original hash sum of the local files. So, I would have the local MD5 hash value if any future implementation with MD5 hash sum is included in encrypted data.

If you are using a crypt, you'd turn this off as it provides no value.

You'd probably want to get a feature request to turn this off if you are encrypting.

When you are starting the upload, how do you know it's the same file unless you md5sum now? You have to run a m5dum to validate nothing has changed if you want to ensure it's the same file. If you aren't concerned about data consistency, don't use the md5sum at all and just use size/modtime.

So you are asking to add in data to the file or a metadata file to keep along with the file that somehow pairs up with the file? That is possible but definitely some overhead in setting it all up. Best bet is if you can think through conceptually how you want it to work, make a feature request on github and if someone has time or other folks find it valuable, someone would pick it up and work on it.

Ideally I would do that, but I can't since Jottacloud require the MD5 value. I've been thinking about to switch cloud provider which have unlimited storage to a fair price. I highly doubt Jotta will change their practice if I make a request about this.

The first step is that I want to get some knowledge about the earlier quote about that if the source does not have a MD5 checksum, it has to be calculated. How does rclone check that the file has a MD5 value so I can add one. Is it only in the metadata or could I simply add a separate txt-file witht the sum etc.

It's not quite that simple as it depends on which cloud provider as not all providers use the same checksums:

https://rclone.org/overview/

If the source is a remote without a hash on it(like a local file system), rclone calculates the hash and depending on the remote, compares them uses that for validation. If no hashes are in common, it falls back to size/modtime/etc.

The trick here is rclone is made to work on many different remotes and what they provide.

I wasn't suggesting you ask Jottacloud to change their policies as if you are using an encrypted remote anyway, the hash is useless, rclone could possibly submit something else / garbage for the md5sum and not calculate it as a different way to tackle the issue.

It's odd if that would work considered it's two years since the issue got brought up according to (still) open issues. I can try to dig some more into this.


I replied in the other thread - I think this beta will fix the problem for you - it shouldn't require copying the file to TMPDIR any more

https://beta.rclone.org/branch/v1.51.0-154-ga283c655-fix-crypt-src-hash-beta/

Wonderful! :slight_smile:

One more thing. Has the MD5 hash any value for example Jottacloud when the MD5 hash of the unencrypted file is sent and one use crypt? Since the data received at the destination is encrypted and therefore has a different MD5 hash. Otherwise one could simply send a random MD5 value to skip calculating MD5 unless -checksum is being used when using crypt (which doesn't make sense with the current implementation).

Jottacloud check the MD5 hash you send at the start to the MD5 of the data they receive to ensure data integrity.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.