One thing that I'm kind of surprised to find that Rclone can't do yet is to append hashes to an output file created with the hashsum command.
I'd like to keep a list of hashes that I can periodically use to check for data corruption, but overwriting the file each time the hashsum command is run would defeat the purpose. If data corruption occurred, you'll just be replacing the good hash with the new bad one.
Has anyone in the same situation figured out a method of doing this? I thought about using Awk combined with the --exclude-from option in Rclone, but that leaves a lot of room for error.
If you want to check for corruption, I'm assuming you mean a local disk and not a cloud remote.
If you have corruption, you want to compare an old log file to a new log file and check for differences I'd imagine.
So you'd get a baseline in a log file, that becomes your check file and you check against that. Anything new, you'd append/add to the checksum file as your 'gold' status.
Unless I don't get your use case / flow:
felix@gemini:~/test$ rclone hashsum md5 /home/felix/test --output-file ~/checkfile
felix@gemini:~/test$ rclone hashsum md5 -C /home/felix/checkfile /home/felix/test
= four
= jellyfish-30-mbps-hd-h264.mkv
= three
= two
2022/01/03 20:13:27 NOTICE: Local file system at /home/felix/test: 0 differences found
2022/01/03 20:13:27 NOTICE: Local file system at /home/felix/test: 4 matching files
felix@gemini:~/test$ echo blah >>four
felix@gemini:~/test$ rclone hashsum md5 -C /home/felix/checkfile /home/felix/test
2022/01/03 20:13:36 ERROR : four: files differ
* four
= jellyfish-30-mbps-hd-h264.mkv
= three
= two
2022/01/03 20:13:37 NOTICE: Local file system at /home/felix/test: 1 differences found
2022/01/03 20:13:37 NOTICE: Local file system at /home/felix/test: 1 errors while checking
2022/01/03 20:13:37 NOTICE: Local file system at /home/felix/test: 3 matching files
2022/01/03 20:13:37 Failed to hashsum: 1 differences found
My original goal was to keep a file that contains hashes that I could frequently append new items to, and only verify those hashes every few months. I have a lot of data,, and a Raspberry Pi is doing the hashing so re-hashing all of the data and then comparing the resulting files is a slow process.
I'm finding it difficult to explain, but my end goal is to devise a way to make Rclone hash only the data that has been added since the last run. I could then add those new hashes to the base file, and use that file to verify that the data hasn't changed every few months or so.
It sounds like your data is important, maybe putting it some cloud storage (cheap) and have it lay there for backups would be a better solution.
I think @asdffdsa like's Wasabi and seems to be not that expensive as if your goal is to look for bitrot or something along those lines / corruption.
I dunno. I've had drives for years and just replace them and never noticed anything, but my data on local storage is throwaway / backed up elsewhere so data loss for me is not an issue.
If you can think of how you'd want something like it to work, it's an edge case and you can always submit a feature request on github. There are huge amounts of backlog though so being realistic, I'd look for a scripted solution along the lines above or flesh out your use case a bit more and I'm sure some folks can pitch in ideas as well.
i have many PI, all types, use on a daily basis.
imho, would not trust it for anything other than a cheap media server.
i tend to recyle old desktop computers.
yes, in any location i support, i have found nothing better than this combo.
--- wasabi, s3 clone, known for hot storage, US$6.00/TB/month. for recent backups.
--- aws s3 deep glacier, for cold storage, US$1.00/TB/month. for older backups
that is the way to do it.
in my case, no need to that.
in addition to the wasabi/aws combo, always have a very cheap server, dedicated for backups.
a used desktop computer, new RAM, and some hard drives.
i use the free windows server 2019 hyper-v edition, the server use REFS filesystem, windows version of ZFS.
so no worries about bit-rot and/or manually computing hashes.
let's say you implement your approach, find a corrupted file, than what?
I think @asdffdsa is right and Hasher can be helpful for this workflow.
Say you keep a large archive under /mnt/archive on your box.
Add a section to your ~/.config/rclone/rclone.conf on the same box:
will produce a full sum file in standard format every time you run it... BUT it will actually rehash only new/changed files taking the rest from internal cache.
Files having the same name/modtime after last run will be rehashed just once a year.
You can even set max_age = off to prevent rehashing unchanged files completely (but beware of bitrot).