`rclone md5sum` on a large Encrypted GDrive dirtree is generating multiple errors

There are still some of us who don't have any issues :wink:

1 Like

I'm glad to hear it, and I hope it continues to work well for you. Just be aware that the ox in the slaughterhouse line, when another ox two or three places in front of it gets felled by the hammer, could also be thinking just about the same thing, ie "nothing has happened to me so far, so no reason to worry!" :slight_smile:

Hmm, that is very bad.

How big are these files (roughly)?

The crypt format is broken up into 64k chunks and I have a version of rclone which will write zeroes for the chunks with errors, but otherwise carry on, so we could investigate how much of the file is corrupted if you want.

If it is something like a video file then a 64k chunk lost will cause a minor visual artifact. If it is something less resilient then a 64k chunk missing might make the whole file useless.

Good news on the retries.

I look forward to the final score!

Here's a sorted list of the sizes for these 'bad password' files so far (21 files now, up from 19 as 2 more have cropped up since my last message): http://durval.com/xfer-only/20230306_rclone_bad_password_file_sizes.txt

So they are sized from ~9.5MB all the way to ~10GB, with about half of them over 1GB.

Yeah, it would be nice to have a look at these files and try to understand what happened. For all of them so far I have local copies so I can cmp -b the F'ed version against the good one and have an idea as to the extension/location of the damage.

If it is something like a video file then a 64k chunk lost will cause a minor visual artifact. If it is something less resilient then a 64k chunk missing might make the whole file useless.

Here's a | sort | uniq -c list of their extensions:

  1 avi
  2 gz
  1 JPG
  3 mkv
  5 MOV
  1 mp4
  1 pdf
  6 slob
  1 vmdk

So, 10 of them are video files which should be mostly viewable (as long as the error areas don't include any critical headers or markers within the files), and the others are less resilient formats which would be completely unusable if I didn't have local copies for them.

I will be sure to keep this topic posted!

And BTW, thanks again for all your great help, and for making rclone available in the first place.

I just ^C'ed stressapptest after a little over 67h running, and the result was:

^CLog: User exiting early (1409823927 seconds remaining)
Stats: Found 0 hardware incidents
Stats: Completed: 5766200320.00M in 241479.67s 23878.62MB/s, with 0 hardware incidents, 0 errors
Stats: Memory Copy: 5766200320.00M at 23878.65MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: PASS - please verify no corrected errors

So, I think it's reasonably safe to assume the current machine isn't experiencing any sort of memory corruption.

This has a new flag --crypt-pass-bad-blocks which will output blocks which couldn't be authenticated as 64k of 0s.

Its possible if bytes have been added/removed to the file that it will output an endless stream of errors, but hopefully it will just be a small number of blocks.

v1.62.0-beta.6770.65ab5b70c.fix-crypt-badblocks on branch fix-crypt-badblocks (uploaded in 15-30 mins)

Looks good to me :slight_smile:

1 Like

Thanks, just downloaded versions for both machine architectures I use, and already tested the arm64 here and it seems to be working.

My 1st run, on the smallest of those files, doesn't look very promising: got the "crypt: ignoring: failed to authenticate decrypted block - bad password?" 49 times in a row -- which IIUC means 49 * 64K= ~3.1MB of the file is lost... and that on a ~9.5GB file :frowning: doesn't look like the "small number of blocks" hypothesis is going to hold :frowning:

Will do more tests when able (treading carefully because I'm still waiting for that month-long rclone md5sum to finish running on this same remote) and will post the results here.

Thanks Again!

Ouch 3 MB of corruptions... This could be 1 corrupted bit every 64k bytes but that is a fair burst of errors.

At least it didn't have corruptions all the way to the end of the file.

:frowning: my thoughts exactly. this doesn't bode well for Google Drive's data integrity... :frowning:

As I have local copies of all the original files (at least so far, fingers crossed), I can do a comparison after all that's over with; my plan is, for each of those files:


  2. use rclone (...) copyto to copy my known-good local file to PATH/FILE.EXT

  3. use rclone cryptdecode --reverse to find the encrypted name for both files;

  4. download both files from the base (unencrypted) remote, and then compare them bit-by-bit to see exactly what the corruption is.

@ncw, what do you think? Would that work? The above depends on the same file being uploaded again to the same encrypted remote and generating the same encrypted content on the base (unencrypted) remote. Is that the case?

This isn't the case normally. Rclone will generate a random nonce (as it is known in cryptography) for each encryption, so each time a file is encrypted is is different. In fact it weakens the crypto if you re-use the nonce.

It is however possible to do exactly what you want, it would require a modified rclone though which you could say - read the nonce from this file before encrypting this other file with it. That could be a backend command quite easily though.

1 Like

Thanks for the explanation. Of course :man_facepalming:, the right thing to do is to use a random nounce.

That could be a backend command quite easily though

Or perhaps some option like --crypt-use-nounce-from CRYPT_REMOTE:PATH/FILE.

But I think it's a very specific case and I think I already abused my privilege as an rclone user too much to ask that of any developer, much less from you who has helped me so greatly in all of this.

But if the itch to dig deeper into this is bad enough, and I have the leisure for it, I might try implementing it and submitting a PR. Not sure this will be the case, tho -- have my plate quite full at the moment, and my knowledge of Golang is still extremely superficial


I am supposed to be getting the v1.62 release ready so trying not to get sidetracked with interesting crypto and data corruption problems!

It would certainly be easy to bodge it into rclone

You'd alter this bit of code

To enter a fixed nonce. You can read the nonce from the header of the file as detailed here: Crypt - it is bytes 8-31 inclusive in the file.

1 Like

PS We recently found a data corruption bug in onedrive which you may enjoy reading about here: Files are corrupted sometimes when uploaded with multipart uploads · Issue #1577 · OneDrive/onedrive-api-docs · GitHub

1 Like

Many thanks again @ncw for the additional tips! :+1::+1::+1:

These will make things much much easier when I go down that path (which as of now I'm much more certain that I will).

Let me know if you need help!

I'd use a hex dump tool to read the nonce and just write the hex into the rclone code for first attempt.

Like this

diff --git a/backend/crypt/cipher.go b/backend/crypt/cipher.go
index ae1e62393..d10aa0872 100644
--- a/backend/crypt/cipher.go
+++ b/backend/crypt/cipher.go
@@ -671,6 +671,8 @@ type encrypter struct {
 	err      error
+var myNonce = nonce{0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01}
 // newEncrypter creates a new file handle encrypting on the fly
 func (c *Cipher) newEncrypter(in io.Reader, nonce *nonce) (*encrypter, error) {
 	fh := &encrypter{
@@ -684,10 +686,11 @@ func (c *Cipher) newEncrypter(in io.Reader, nonce *nonce) (*encrypter, error) {
 	if nonce != nil {
 		fh.nonce = *nonce
 	} else {
-		err := fh.nonce.fromReader(c.cryptoRand)
-		if err != nil {
-			return nil, err
-		}
+		// err := fh.nonce.fromReader(c.cryptoRand)
+		// if err != nil {
+		// 	return nil, err
+		// }
+		nonce = &myNonce
 	// Copy magic into buffer
 	copy(fh.buf, fileMagicBytes)

Good luck!

1 Like

Now you practically did everything for me!:grinning:
Thanks again and I will keep you posted.

1 Like

FYI It seems that Google is now limiting the maximum number of files on Google drive to 5 million. New Limit Unlocked on Google Drive - #43 by root0r

1 Like

Thanks for the heads-up. A few thoughts have crossed my mind after reading this:

  1. The Big G moves one step further in the enshittification of its services; this will end up making Drive completely unusable for many of us.

  2. Case in point, I now have WELL OVER that (more than 13M files). So this new limit completely effs it up for my use case.

  3. Drive was one of the few services where Google was actually competitive (besides Search). And now Google is about to become non-competitive in Drive too, at least for us 'heavy users' who are more likely to pay for the service instead of just enjoying the free plan. I don't think Google will sell many more Workspace plans, and will probably end up actually losing more than just a few of their current customers.

  4. With ChatGPTized Bing moving onto Google Search (the only really competitive Google product so far), I fear that Google is not long for this world... therefore I would be wary of remaining with Google even if my use case fit within this new limit.

  5. as per 2) and 3), what I think we will end up seeing a wave of migration of users from Drive to the alternatives, so the ones that don't have a solid business plan in order to withstand it, will either have to follow Google and enshittificate their services accordingly, or close up shop, or raise their prices up significantly.

  6. as a consequence of all the above, the idea of spending the necessary money to buy the big HDDs necessary to start 'self-hosting' my own cloud backups at a friend's house looks more and more compelling....

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.