Does google drive support md5 checksums?


#1

Okay so I ran rclone check:

2018/02/10 07:14:21 NOTICE: Encrypted drive ‘gd:phone samsung s7’: 0 differences found2018/02/10 07:14:21 NOTICE: Encrypted drive ‘gd:phone samsung s7’: 8 hashes could not be checked

Then I ran rclone move:

2018/02/10 07:17:44 DEBUG : download/AzRecorderFree sd/2017_10_17_21_18_55.mp4: Size of src and dst objects identical2018/02/10 07:17:44 DEBUG : download/AzRecorderFree sd/2017_10_17_21_18_55.mp4: Unchanged skipping

I even used the --checksum flag for rclone move, yet it’s telling me it only compared the sizes?
does this mean google drive doesn’t support md5 anymore?

edit: my linux install does absolutely have “md5sum” so… what’s going wrong here?
edit2: nevermind found https://rclone.org/commands/rclone_cryptcheck/
is there a cryptmove?


#2

Yes that is the important bit!

No, but I don’t think one is needed just use rclone move


#3

rclone move doesn’t checksum if you’re moving to an encrypted remote though.
So you’d have to rclone cryptcheck then rclone move.
It’s a very minor issue, but the fact rclone cryptmove doesn’t exist, and rclone move doesn’t work properly with encrypted remotes confused me personally. It seems from my edits like I figured it out fast, but it’s been a few (days) TB moved without checksum, before I even started to question the issue. (although odds are date+size was sufficient so it’s not a big deal.)


#4

I see what you mean… Yes a cryptcheck is the right solution for the moment.


#5

In theory rclone move could get a -cryptcheck flag, or this could just remain a fringe unaddressed issue, because, it’s pretty easy to cryptcheck then move. Although, this will result in checking size and checksum and then checking date and size, checking size twice. This is really bad for googledrive specifically because googledrive HATES giving out lots of api requests.

Maybe move could gain a -date-only flag? That way rclone cryptcheck followed by rclone move -date-only will result in checking size and checksum and then date, so at least each attribute is only checked once not twice?


#6

Yes it would be possible to do an integrity check the way cryptchcheck does it for all local->remote rclone operations. It is quite expensive though as the file has to be encrypted locally and the nonce has to be fetched from the remote.

I think it is complicated enough that I’m not going to implement it though - we have cryptcheck for that.

When you read a directory listing in google drive you get size, md5sum and date all at once, so I don’t think it make much difference with the number of API calls - you are still doing the listing twice either way.

–date-only is effectively the default if you don’t put in --checksum or --size-only. So without those two flags, rclone will check (size, date) only


#7

I’ve found directory listings to be super laggy in googledrive and I’m hitting the api calls per minute so often I was just hopeful for any suggestion to reduce that (sorry for the bad suggestion). It looks like you’ve already thought over those repercussions though, so that’s good.


#8

So, cryptcheck is insanely slow on googledrive, not sure exactly why; but I realized; if I have the harddrive space available instead of rclone copy localfiles: cryptedremote:
I could rclone copy localfiles: localcrypt:
Then rclone move undecrypted parent folder of localcrypt: to gdremote:crypt
This would save massively on api requests to googledrive.

The files would end up being encrypted once, moved to googledrive once, and could compare more readily available hashes in their undecrypted form. I just figured I’d write back about this idea, since it took me this long to think of it.

I’m not sure though why cryptcheck is so slow, it seems like cryptcheck is actually slower than a command like
rclone copy localdata: cryptedremote:
Maybe as much as ten times slower. Which doesn’t really make sense to me, comparing the hashes should be about as fast as creating the encryption in the first place shouldn’t it?
Even if my estimate of ten times slower is off, even two or three times slower seems odd to me. Maybe I don’t fully understand how cryptcheck works. Doesn’t it just encrypt the local file in memory and then compare that hash to the remote hash? is google slow to return their hashes? (no that can’t be it, because rclone --checksum move to googledrive is reasonably quick)…

edit1: This is probably normal, but I have noticed that cryptcheck gives -vv feedback like this:
filename OK
filename OK
filename2 OK
filename2 OK

I assuming it’s supposed to list each filename twice, because it’s confirm both the local and remote file have the OK hash? Although if it was just checking each check twice over that would explain why it’s twice as slow(or slower) as my creation of encrypted remotes were in the first place.

edit3: nevermind it really wasn’t that slow, it was just slow to get started, maybe did the files in a different order?


#9

crytcheck is slow for two reasons

  1. it has to download the nonce from the start of each file. (This will get a speed improvement quite soon once the proper range handling is in.)
  2. It has to encrypt the entire local file to create the hash to check.

Yes that would work. You could then use rclone check localcrypt: gdremote:crypt which will be nice and fast.