How can you verify the integrity of a crypt remote?

What is the problem you are having with rclone?

Rclone is not checking the contents of the files on my "crypt" remote. It is bypassing the content integrity checks altogether, using only the size and modification time. I've tried everything I can think of to get rclone to perform content checks, but it simply is not happening. What can I do to get rclone to actually check my data?

I need to know whether my files were perfectly uploaded to cloud (with no bit flips), and I'd like to occasionally check my uploaded copy for bit rot. Right now, I cannot see any way to do either of these things. Am I missing something? This seems like a fundamental issue that would affect nearly everyone.

Run the command 'rclone version' and share the full output of the command.

rclone version

rclone v1.67.0
- os/version: linuxmint 21.3 (64 bit)
- os/kernel: 6.5.0-1023-oem (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.4
- go/linking: static
- go/tags: none

This is the current version of rclone:

Which cloud storage system are you using? (eg Google Drive)

I'm using "pcloud" as my storage system. Pcloud provides both a regular unencrypted filesystem and a custom encrypted "Crypto" folder. I'm using the regular filesystem and NOT the special "Crypto" folder.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

I created a "pcloud" remote in the EU region. According to the rcloud documentation, "sha1" is supported by this cloud provider.

Next, I created an encrypted "crypt" remote on top of the plain-text "pcloud" remote.

Then I attempted to upload my files to the cloud. For testing purposes, I created a folder on my local filesystem like this:

a/
	a.txt # "a"

I also created a second corrupted copy, where one of the files has a single-byte error:

b/
	a.txt # "b"

...and then I set the timestamps to be identical for both the original and corrupted versions:

touch -r 'a/a.txt' 'b/a.txt'

At this point, I mounted the "crypt" filesystem, in order to provide some visibility into the cloud filesystem:

rclone mount 'crypt:' 'crypt'

And then I ran the command:

rclone sync 'a/' 'crypt:a/'

Using the "crypt" mount point, I manually verified that the directory was uploaded successfully.

And now I attempted to synchronize from the corrupted directory (simulating a bit flip in either one of the copies):

rclone sync 'b/' 'crypt:a/'

Now the source and remote files do not match, so the remote file ("crypt:a/a.txt") should be updated to match the new source file ("b/a.txt"). This did not happen. There were no corrections to the remote file, and no errors that would reveal any discrepancy.

In hopes of adding support for the deeper data checks, I created a "hasher" remote on top of my "crypt" remote, and I repeated the experiment with the "hasher" remote:

rclone sync 'b/' 'hasher:a/' \
	--hasher-max-age 0 \
	--hasher-auto-size 1P

The results were the same: rclone did not synchronize the files or reveal any discrepancy.

Single-bit flips do happen during network transfers, and on a somewhat regular basis. How can I ever know if my files were uploaded correctly? I don't see any way. Am I missing something?

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

rclone config redacted

[pcloud]
type = pcloud
hostname = eapi.pcloud.com
token = XXX

[crypt]
type = crypt
remote = pcloud:
filename_encryption = standard
directory_name_encryption = true
password = XXX

[hasher]
type = hasher
remote = crypt:
hashes = sha1
max_age = 0s

A log from the command that you were trying to run with the -vv flag

I'm not sure what would be helpful here. If you need any additional information, you can ask me for anything.

Have a look at cryptcheck even I am not sure it is exactly what you need here.

And what in your opinion rclone could do here? In ideal world?

As I understand your problem its perfect fix would require Pcloud ability to calculate file hashes on demand - and such functionality does not exist.

Have a look at cryptcheck even I am not sure it is exactly what you need here.

Thank you. I have tried cryptcheck, but without any luck. Cryptcheck also skipped the data integrity check and defaulted to the simple size / time comparison. I tried a different hash command as well, but I believe it generated blank hashes that could not be used by the other commands. I should write up a section about this in my question, so people can see the full extent of it: I literally tried everything before I gave up after hours of frustration. But... perhaps there's something I missed?

And what in your opinion rclone could do here? In ideal world?

The only way to know if the remote data is intact is to download it and check it. So, I imagine that rclone would have to download and decrypt the cloud data, ideally as a buffered stream to avoid overflowing the client's RAM, and gradually compare the decrypted stream against the raw data stream from the original source file. This would be a very expensive operation, so it would have to be disabled by default, with a flag to enable the deep content comparison.

If the cloud provider is storing the encrypted file hash as a simple file attribute, this would be the only way to verify the integrity of the data. Bit rot could silently change the contents of the file in the cloud without changing the hash attribute reported by the cloud.

But, if you do trust the cloud provider retain a perfect copy of your data, then you could simplify things by comparing a locally-generated hash against the hash attribute reported by the cloud. I believe rclone can already do this for you, but it did NOT work for me. I don't know why. If I remember correctly, the hashes were all blank when I tried to generate them. I didn't see anything further that I could try. After hours of effort, I gave up.

Comparing the file hashes would allow you to detect bit flips introduced during the file upload, and it would also catch any changes that appear in the source file (e.g. from a failing hard drive). So a simple hash comparison would be of some use, and would be much faster than the full content comparison.

On the other hand, the full content comparison is the only way to know if your data is intact on the remote server. You cannot verify that the server has a clean copy of your data if you never check the file contents on the cloud.

I don't see any way to edit my original post, so I'm adding the results of several other commands here:

rclone sha1sum 'crypt:'

2024/07/13 15:19:00 ERROR : a/a.txt: hash unsupported: hash type not supported
...

The rclone documentation says that SHA1 is supported by my cloud provider. Perhaps the checksums are not available for "crypt" remotes in particular?

I tried adding hashes by wrapping the "crypt" remote within a "hasher" remote:

rclone sha1sum 'hasher:'

                                          a/a.txt
...

This produced blank hashes--which, of course, won't help with any of the other commands.

For completeness, I'll also show what happens when you use cryptcheck:
rclone cryptcheck 'a/' 'crypt:a/'

				2024/07/10 20:35:53 NOTICE: Encrypted drive 'crypt:a/': 1 hashes could not be checked
				2024/07/10 20:35:53 NOTICE: Encrypted drive 'crypt:a/': 1 matching files

I want to use rclone, but I cannot verify my files! Is there anything I can do? I could use some help.

I just thought I'd mention that duplicity does notice bit flips, and updates the cloud copy automatically.

Duplicity has a "verify" command to check the integrity of your data.

This will check all of your remote files against their cloud hashes. It downloads a copy of each file, hashes it, and checks whether the current hash matches the original hash that was created when you uploaded the file:
duplicity verify 'rclone://pcloud:' 'a/'

If you'd also like to check whether your local files match the cloud versions, you can add the "--compare-data" option. This will hash each local file and compare that local hash against the cloud hash:
duplicity verify --compare-data 'rclone://pcloud:' 'a/'

I figured I'd mention it, as an idea for rclone.

I also worry about bit flips and other corruption on my encrypted backup storage. Although it uses a lot of bandwidth, I use the --download flag to produce a checksum file from my encrypted storage. I also download the hash file from my source directory, and then I use a simple script to diff the two hash files.

# unencrypted source 
rclone  hashsum quickxor source:top > source.qx  
# encrypted backup
rclone  hashsum quickxor --download enc_backup:top > enc_backup.qx

Thank you, that might have worked. I was using an older version of rclone packaged by Ubuntu which didn't support the "--download" flag. But the rclone Downloads page has an up-to-date *.deb file which does work with Ubuntu--in case anyone else needs it.

I ended up going with Borg for replication, because it does such a fantastic job of replicating your data. Borg notices and fixes silent bit-flips, keeps all of your metadata, deduplicates data to reduce the size (and cost) of your offsite backups, and also supports snapshots in case you want to use them.

For my cloud provider, I chose rsync.net, because they give you direct access to your files (which means Borg and rsync work very well), they're reliable, ethical, and they have discount rates that make them cheaper than most other cloud storage providers. You can email "info@rsync.net" to see if you qualify for a discounted rate. Tech-savvy users, students, businesses, and a few other categories definitely qualify for a discounted rate.

1 Like

yes, good choice, i have been using them since 2019.
in the past, i was able to get good pricing for a yearly plan.
i might be grandfathered in, need to look into that...

anyhoo, i am glad to find that this link, for rclone users, is still live and oh, so very clickable...
https://www.rsync.net/products/rclone.html

these days, i can get that and more at hetzner for $2.63USD/TiB/month.
also, can rclone mount that as local storage, when using hetzner cloud machines in same datacenter.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.