`rclone md5sum` on a large Encrypted GDrive dirtree is generating multiple errors

Thanks, just downloaded versions for both machine architectures I use, and already tested the arm64 here and it seems to be working.

My 1st run, on the smallest of those files, doesn't look very promising: got the "crypt: ignoring: failed to authenticate decrypted block - bad password?" 49 times in a row -- which IIUC means 49 * 64K= ~3.1MB of the file is lost... and that on a ~9.5GB file :frowning: doesn't look like the "small number of blocks" hypothesis is going to hold :frowning:

Will do more tests when able (treading carefully because I'm still waiting for that month-long rclone md5sum to finish running on this same remote) and will post the results here.

Thanks Again!

Ouch 3 MB of corruptions... This could be 1 corrupted bit every 64k bytes but that is a fair burst of errors.

At least it didn't have corruptions all the way to the end of the file.

:frowning: my thoughts exactly. this doesn't bode well for Google Drive's data integrity... :frowning:

As I have local copies of all the original files (at least so far, fingers crossed), I can do a comparison after all that's over with; my plan is, for each of those files:

  1. Rename the corrupt PATH/FILE.EXT to PATH/FILE.EXT_-_CORRUPTED_YYYYMMDD

  2. use rclone (...) copyto to copy my known-good local file to PATH/FILE.EXT

  3. use rclone cryptdecode --reverse to find the encrypted name for both files;

  4. download both files from the base (unencrypted) remote, and then compare them bit-by-bit to see exactly what the corruption is.

@ncw, what do you think? Would that work? The above depends on the same file being uploaded again to the same encrypted remote and generating the same encrypted content on the base (unencrypted) remote. Is that the case?

This isn't the case normally. Rclone will generate a random nonce (as it is known in cryptography) for each encryption, so each time a file is encrypted is is different. In fact it weakens the crypto if you re-use the nonce.

It is however possible to do exactly what you want, it would require a modified rclone though which you could say - read the nonce from this file before encrypting this other file with it. That could be a backend command quite easily though.

1 Like

Thanks for the explanation. Of course :man_facepalming:, the right thing to do is to use a random nounce.

That could be a backend command quite easily though

Or perhaps some option like --crypt-use-nounce-from CRYPT_REMOTE:PATH/FILE.

But I think it's a very specific case and I think I already abused my privilege as an rclone user too much to ask that of any developer, much less from you who has helped me so greatly in all of this.

But if the itch to dig deeper into this is bad enough, and I have the leisure for it, I might try implementing it and submitting a PR. Not sure this will be the case, tho -- have my plate quite full at the moment, and my knowledge of Golang is still extremely superficial

:slight_smile:

I am supposed to be getting the v1.62 release ready so trying not to get sidetracked with interesting crypto and data corruption problems!

It would certainly be easy to bodge it into rclone

You'd alter this bit of code

To enter a fixed nonce. You can read the nonce from the header of the file as detailed here: Crypt - it is bytes 8-31 inclusive in the file.

1 Like

PS We recently found a data corruption bug in onedrive which you may enjoy reading about here: Files are corrupted sometimes when uploaded with multipart uploads · Issue #1577 · OneDrive/onedrive-api-docs · GitHub

1 Like

Many thanks again @ncw for the additional tips! :+1::+1::+1:

These will make things much much easier when I go down that path (which as of now I'm much more certain that I will).

Let me know if you need help!

I'd use a hex dump tool to read the nonce and just write the hex into the rclone code for first attempt.

Like this

diff --git a/backend/crypt/cipher.go b/backend/crypt/cipher.go
index ae1e62393..d10aa0872 100644
--- a/backend/crypt/cipher.go
+++ b/backend/crypt/cipher.go
@@ -671,6 +671,8 @@ type encrypter struct {
 	err      error
 }
 
+var myNonce = nonce{0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01}
+
 // newEncrypter creates a new file handle encrypting on the fly
 func (c *Cipher) newEncrypter(in io.Reader, nonce *nonce) (*encrypter, error) {
 	fh := &encrypter{
@@ -684,10 +686,11 @@ func (c *Cipher) newEncrypter(in io.Reader, nonce *nonce) (*encrypter, error) {
 	if nonce != nil {
 		fh.nonce = *nonce
 	} else {
-		err := fh.nonce.fromReader(c.cryptoRand)
-		if err != nil {
-			return nil, err
-		}
+		// err := fh.nonce.fromReader(c.cryptoRand)
+		// if err != nil {
+		// 	return nil, err
+		// }
+		nonce = &myNonce
 	}
 	// Copy magic into buffer
 	copy(fh.buf, fileMagicBytes)

Good luck!

1 Like

Now you practically did everything for me!:grinning:
Thanks again and I will keep you posted.

1 Like

FYI It seems that Google is now limiting the maximum number of files on Google drive to 5 million. New Limit Unlocked on Google Drive - #43 by root0r

1 Like

Thanks for the heads-up. A few thoughts have crossed my mind after reading this:

  1. The Big G moves one step further in the enshittification of its services; this will end up making Drive completely unusable for many of us.

  2. Case in point, I now have WELL OVER that (more than 13M files). So this new limit completely effs it up for my use case.

  3. Drive was one of the few services where Google was actually competitive (besides Search). And now Google is about to become non-competitive in Drive too, at least for us 'heavy users' who are more likely to pay for the service instead of just enjoying the free plan. I don't think Google will sell many more Workspace plans, and will probably end up actually losing more than just a few of their current customers.

  4. With ChatGPTized Bing moving onto Google Search (the only really competitive Google product so far), I fear that Google is not long for this world... therefore I would be wary of remaining with Google even if my use case fit within this new limit.

  5. as per 2) and 3), what I think we will end up seeing a wave of migration of users from Drive to the alternatives, so the ones that don't have a solid business plan in order to withstand it, will either have to follow Google and enshittificate their services accordingly, or close up shop, or raise their prices up significantly.

  6. as a consequence of all the above, the idea of spending the necessary money to buy the big HDDs necessary to start 'self-hosting' my own cloud backups at a friend's house looks more and more compelling....

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.