How to identify files that can't be cryptcheck

I’m trying to run cryptcheck to compare my local and remote for any issues (e.g. bit rot). If any of my local files have changed I can download them from the remote.

I’m using 1.40 and the command I’m running is.

rclone cryptcheck --fast-list --verbose /path/to/files encryptedremote:path/to/files

When it completes I see the following message. The number varies based on the path that I’m checking.

X hashes could not be checked

I tried using clone md5sum to identify the files but that’s not working. I’ve created a topic for that issue. Md5sum is not working with S3 on 1.40

I would guess these are large files that were uploaded with pre-1.40 rclone.

S3 doesn’t support adding md5sums to large files natively, so rclone now adds them for large files.

Does that make sense?

I’m afraid there isn’t an easy way of adding the md5sum metadata…

my guess is that those files can be checked? did you try running the check again? only for those files?
I’ve found that for example googledrive will return roughly 2-5 errors in 200,000 files when everything is fine. The error will be a google server request error, something like 500 (server didn’t respond) or 503 (server responded but didn’t give us the nonce)?
just running rclone cryptcheck --fast-list --verbose /path/to/files/ encrypted remote:path

in order to get the filenames just include a log file? then search that log file for errors, may or may not prefer -vv over -v.

That said… it seems maybe you already tried this? and are sure this isn’t the issue? if so, sorry.

Any suggestions on how to identify the files that are missing md5sums? I’ve tried adding -v, -vv, -vvv, -vvvv and -vvvvv but it only lists the files that are “OK”.

If you run rclone --fast-list md5sum s3remote:bucket it will show files with and without hashes. The ones missing hashes will be blank. Run that on the underlying remote, not on the crypt remote.

Thank you! I was able to narrow it down to the following process.

  1. Optional, run cryptcheck to identify files with missing hashes (look for ‘X hashes could not be checked’).

rclone cryptcheck --fast-list --verbose /path/to/files encryptedremote:path

  1. Run rclone lsd with --crypt-show-mapping to show how the names encrypt.

rclone lsd --crypt-show-mapping encryptedremote:path

  1. Run md5sum to get list of files that are missing a md5sum hash (Must be run again non-crypt remote). Note the names of the files that are missing a md5sum.

rclone md5sum --fast-list remote:path

  1. Run rclone lsd with --crypt-show-mapping to identify the files that are missing the md5sum hash. I tried piping this to grep and searching for the filename but rclone is not sending the result of --crypt-show-mapping to stdout/stderr?

rclone lsd --crypt-show-mapping encryptedremote:path

It’s a bit of extra work as you first have to use lsd with --crypt-show-mapping to determine the path. Is there any way to cut this step out?

rclone lsd --crypt-show-mapping encryptedremote:path

What’s interesting is that I’m finding small files that were uploaded with rclone that are also missing a md5sum hash. I’m planning to try deleting them from the remote and running a sync again to upload them. Is there a better way to do this?

The threshold for multipart uploading for s3 is quite small - 5MB maybe? So you will see quite small files without hash.

You could add the md5 metadata without re-uploading, but that would require a bit of custom coding… So if you haven’t got too many files, I’d just re-upload them.

@ncw One of the directories I’m working with is around 100 MB and consists of 400 files with the majority ranging between 4 KB - 1 MB.

When I run crypcheck it’s reporting “1 hashes could not be checked”.

rclone cryptcheck --fast-list --verbose /path/to/files encryptedremote:path

The file missing the hash is actually the largest file at 5.9 MB so it seems that the md5sum threshold is quite low.

How would you re-upload the file? Do I need to delete it manually from the destination and then run a sync or can rclone delete it?

Yes it is 5MB.

You can delete it with rclone

rclone delete remote:bucket/path/to/file.xxx

Try it with --dry-run first.

Then running the sync will re-upload it.

I think I misunderstood. Is the limit on hashes 5 MB and any files larger than 5 MB will NOT have a hash?

I uploaded 15 files to S3 yesterday and “rclone cryptcheck” is reporting “13 hashes could not be checked”. Below is the output of “rclone ls /path/to/files” with the filenames removed. Only the first two files of size 426 and 43783 have hashes per “rclone md5sum --fast-list remote:path”.

426
43783
341655552
370722816
367448064
4821843968
4453072896
2990735360
3913940992
617754624
373622784
681867016
2434566144
2400239616
972087296

That was correct for rclone 1.39. For rclone 1.40 all files should have a hash if you uploaded them with rclone sync/copy/move. If you uploaded them by copying them via a mount then only the files smaller than 5MB will have a hash - is that what you are doing?

The upload of the 15 files was done with clone 1.40 but 13 of them are missing hashes. The command I ran was.

rclone sync --bwlimit 1.0M --checkers 32 --delete-after --exclude --fast-list --log-file rclone.log --stats-unit bits --transfers 1 --verbose /path/to/files encryptedremote:path/to/files

Oh, I see… You are uploading them to a crypted remote, in which case you won’t get a hash unless it is below 5MB :frowning:

This is fixable - see the issue I linked above.

I’m not seeing an issue. Are you referring to Option to calculate checksum if missing #854?

So as of now running “rclone cryptcheck --fast-list --verbose /path/to/files encryptedremote:path/to/files” is expected to return “X hashes could not be checked” for any files over 5 MB?

Is there anything else that I can do to verify the integrity of my local and remote files. My biggest concern is bit rot.

I meant this issue: https://github.com/ncw/rclone/issues/2213

Yes. You can move that threshold with the latest beta using

  --s3-chunk-size int                   Chunk size to use for uploading (default 5M)

Note that the chunks are buffered in memory though.

You can use rclone check --download if you don’t mind downloading the data to check it.

There are some issues which would help

I think the best thing of all would be to make sure files > 5MB uploaded through crypt get a hash.

Thank you for clarifying. I’ve been reading over Option “–s3-disable-checksum” #2213 and it’s great to see that you are actively discussing a resolution. Let me know if I can do any testing.

If I were to use the latest beta would it create issues to do “–s3-chunk-size 5 GB”? I try to avoid sync any files to S3 that are 5 GB or larger to avoid multipart uploads but the reality is that I do have some. My goal is for all files in my crypt remote to have hashes and for cryptcheck to succedd.

If you do that then you’ll likely run out of memory.

Maybe we should bring back the --s3-upload-cutoff which can use the old single part upload which is good to 5GB?

I’ll write that on the issue now!

That would be great! Again my goal here is to run cryptcheck on a regular basis to verify my local and remote files are still identical.

1 Like

@ncw with the release of 1.41 I see that “–s3-disable-checksum” was added. I wanted to get your insight on how to use the new command to address my issue. Do I need to delete all my files that are missing a hash and re-upload them using “–s3-disable-checksum”?