What is the best way to calculate and write hash sums in metadata to files which Rclone can read to prevent it from creating new files when using example rclone copy to a crypt volume at example Jottacloud which require the MD5 value?
My goal is to make rclone copy skip having to copy the file to a TMPDIR folder before upload for calculation of MD5 to relax the CPU when using Rclone. So, I'm looking for a way to save the calculated hash (both MD5 and SHA1) to the metadata of a new local copy of the file (the old file will be deleted) which Rclone can read. This way rclone can easily read both MD5 and SHA1 hash in the metadata of a file for comparison.
Note that Jottacloud requires the MD5 hash before upload so if the source does not have an MD5 checksum then the file will be cached temporarily on disk (wherever the TMPDIR environment variable points to) before it is uploaded.
So, I interpret the following to that if rclone copy can read the MD5 hash before upload, one can skip the step to cache the file temporarily on disk to calculate the MD5 hash before it is uploaded. In that regard, I'm asking for a way to save the hash sum of every file I have in the file's metadata which rclone can read.
The only simple solution I know of is the usage of Extended File Attributes (xattr) with a simple command such as for file do xattr -w filehash "$(md5 -q "$file")" "$file" or something.
There are several reasons. The first is to prevent rclone from both copy and calculate MD5 values while uploading to the crypt. Since I'm using a low-powered Arm server, the speed are much lower when both tasks have to be done at the same time. Ideally, if Jottacloud would not require MD5 values prior upload I wouldn't have looked for a solution like this. Secondly, if I need to upload the file again later to either Jottacloud or another cloud service, I still have the original hash sum of the local files. So, I would have the local MD5 hash value if any future implementation with MD5 hash sum is included in encrypted data.
If you are using a crypt, you'd turn this off as it provides no value.
You'd probably want to get a feature request to turn this off if you are encrypting.
When you are starting the upload, how do you know it's the same file unless you md5sum now? You have to run a m5dum to validate nothing has changed if you want to ensure it's the same file. If you aren't concerned about data consistency, don't use the md5sum at all and just use size/modtime.
So you are asking to add in data to the file or a metadata file to keep along with the file that somehow pairs up with the file? That is possible but definitely some overhead in setting it all up. Best bet is if you can think through conceptually how you want it to work, make a feature request on github and if someone has time or other folks find it valuable, someone would pick it up and work on it.
Ideally I would do that, but I can't since Jottacloud require the MD5 value. I've been thinking about to switch cloud provider which have unlimited storage to a fair price. I highly doubt Jotta will change their practice if I make a request about this.
The first step is that I want to get some knowledge about the earlier quote about that if the source does not have a MD5 checksum, it has to be calculated. How does rclone check that the file has a MD5 value so I can add one. Is it only in the metadata or could I simply add a separate txt-file witht the sum etc.
If the source is a remote without a hash on it(like a local file system), rclone calculates the hash and depending on the remote, compares them uses that for validation. If no hashes are in common, it falls back to size/modtime/etc.
The trick here is rclone is made to work on many different remotes and what they provide.
I wasn't suggesting you ask Jottacloud to change their policies as if you are using an encrypted remote anyway, the hash is useless, rclone could possibly submit something else / garbage for the md5sum and not calculate it as a different way to tackle the issue.
One more thing. Has the MD5 hash any value for example Jottacloud when the MD5 hash of the unencrypted file is sent and one use crypt? Since the data received at the destination is encrypted and therefore has a different MD5 hash. Otherwise one could simply send a random MD5 value to skip calculating MD5 unless -checksum is being used when using crypt (which doesn't make sense with the current implementation).