Microsoft is switching Onedrive personal to quickXorHash from SHA1

I have heard from Microsoft that they are soon to remove all hashes other than quickXorHash from onedrive personal and business.

This means that the SHA1 hash will be removed from OneDrive personal.

I don't think that this will affect rclone overly - syncing to and from local will still work just fine with hashes.

The only case this could affect users is if someone is relying on having SHA1 hashes on onedrive personal.

This might be

  • syncing from another backend which supports SHA1 to onedrive, eg B2,Box <-> onedrive
  • storing the SHA1 hashes somewhere for comparison
  • perhaps using the hasher backend with SHA1

I don't think this will affect many users so I'd like to switch the the default hash function to quickXorHash for onedrive personal for the next rclone release (v1.62). quickXorHash is already the hash function for onedrive business and sharepoint. In my testing all files on onedrive personal already have a quickXorHash.

If we can get the change into rclone before Microsoft implement their change then it will be easier on rclone users.

Any comments much appreciated.

@Cnly and @Ole you are both good at onedrive - any thoughts?

Interesting, I didn’t find it mentioned anywhere else and rclones current behavior is correct according to these:
https://learn.microsoft.com/en-us/graph/api/resources/hashes
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/resources/hashes

The OneDrive engineer in this 2 years old reddit post however says OneDrive hashes may change over time and one should be prepared to handle whatever is available on a file by file basis:
https://www.reddit.com/r/onedrive/comments/j16357/ensure_that_files_uploaded_are_an_exact_copy_md5/

My knowledge of rclones implementation/usage of hashes is however insufficient to say how to best handle a situation where the available hashes may vary (over time) between regions, accounts, and files within an account.

Can you disclose the exact source and wording of the information you received from Microsoft?

Interesting. Rclone isn't particularly good at changing hashes on the fly. Rclone will try to find a common hash between the source and destination if there are multiple possibilities. However once the transfer has started then the hash used is fixed.

The Microsoft product manager said that they are going to remove the SHA1 hash for content that lives in OneDrive Personal sometime around July 2023.

This means effectively that quickXorHash will be the only hash available. I can see why they want to do that - that will let them unify personal and business onedrive infrastructure and I suspect quickXorHash can be calculated for chunks and the hashes then combined to make the overall quickXorHash which is a property SHA1 does not have.

We could make onedrive personal advertise multiple hashes quickXorHash and SHA1. Rclone will prefer SHA1 over quickXorHash so if it has a choice it will choose SHA1. This means at some point if the objects no longer have SHA1 hashes, then rclone will still choose SHA1 hashes for the transfer but the hashes themselves will be empty which rclone will skip over.

Another possibility might be to make it configurable which hash onedrive personal uses. The default could be quickXorHash but it could be changeable to SHA1 (or CRC-32 or SHA256 which are all currently supported, but I think won't be after the change).

I'm inclined to think this is the best option - make a --onedrive-hash-type option which defaults to quickXorHash but can be changed to SHA1 if the user needs SHA1 hashes in the few months while they are still available.

What do you think?

PS Note that quickXorHash is much faster than SHA1 in the beta since we had a contribution recently to speed it up

$ time rclone hashsum SHA1 /tmp/10G
a0b6e2ca4e28360a929943e8eb966f703a69dc44  10G

real	0m23.403s
user	0m19.180s
sys	0m6.197s

$ time rclone hashsum QuickXorHash /tmp/10G
0000000000000000000000000000008002000000  10G

real	0m4.384s
user	0m1.260s
sys	0m5.279s

I suspected so, and that makes good sense to keep things simple.

Agree, and your test furthermore indicates a factor 10 reduction in CPU time to calculate the hash (assuming the sys time is mainly OS calls to read data). So it sounds like an easy business decision.

I am pleased to see Microsoft's proactive change management and suggest Microsoft also add a small note/warning on the above two web pages to increase transparency and ease the transition. Can you relay that to the Microsoft Product Manager?

I don't understand the first option, but don't think that is important.

I think we should try to unify our OneDrive code just like Microsoft, so your second option sounds best to me and I suggest we make it apply for all types of OneDrive (Personal, Business, Sharepoint) to simplify code and documentation.

Perhaps we should allow --onedrive-hash-type to be set to any rclone hash type including "None" with functionality like --sftp-disable-hashcheck. It may come handy if somebody has an account with a mixture of hashes where this is the only viable option. (We should probably omit irrelevant hashes like MD5 in the OneDrive docs)

I can test (sometime next week) if you make a branch or beta.

PS: I am a little curios to see/test what happens for older versions of rclone (e.g. 1.61.1) when OneDrive Personal no longer supports sha1, can that be simulated by changing this and/or this line?

I've had a go at that here (check the commit for the docs)

v1.62.0-beta.6741.d757edb83.fix-onedrive-hash on branch fix-onedrive-hash (uploaded in 15-30 mins)
This defaults all the onedrives to QuickXorHash and adds a new flag

--onedrive-hash-type

Specify the hash in use for the backend.

This specifies the hash type in use. The default hash is QuickXorHash.
Other hash types are being phased out in favour of QuickXorHash but
this option might be useful in the transition period.

Before rclone 1.62 an SHA1 hash was used by default for Onedrive
Personal. For 1.62 and later the default is to use a QuickXorHash for
all onedrive types. If an SHA1 hash is desired then set this option
accordingly.

This can be set to none to not use any hashes.

If the hash requested does not exist on the object, it will be
returned as an empty string which is treated as a missing hash by
rclone.

Properties:

  • Config: hash_type
  • Env Var: RCLONE_ONEDRIVE_HASH_TYPE
  • Type: string
  • Default: "quickxor"
  • Examples:
    • "quickxor"
      • QuickXor
    • "sha1"
      • SHA1
    • "sha256"
      • SHA256
    • "crc32"
      • CRC32
    • "none"
      • None - don't use any hashes

Great idea - I've done that too. You can actually set the hash type to any rclone supported hash but they will return "" if not found.

I would do this

diff --git a/backend/onedrive/onedrive.go b/backend/onedrive/onedrive.go
index e7674c019..e6a80469a 100644
--- a/backend/onedrive/onedrive.go
+++ b/backend/onedrive/onedrive.go
@@ -1768,6 +1768,7 @@ func (o *Object) rootPath() string {
 
 // Hash returns the SHA-1 of an object returning a lowercase hex string
 func (o *Object) Hash(ctx context.Context, t hash.Type) (string, error) {
+	return "", nil
 	if o.fs.driveType == driveTypePersonal {
 		if t == hash.SHA1 {
 			return o.sha1, nil
1 Like

The branch/beta looks good and passed my tests without issues :tada:

My tests included a basic mount and some simple copy/syncs with sha1, quickxor, and none. I also tested the expected new ERRORS when using:

rclone sha1sum OneDrive:

I didn't test more advanced features like hasher etc.

During the test I unfortunately discovered many files in my OneDrive without an QuickXorHash :astonished:

I found them using this command:

rclone hashsum quickxor OneDrive: --onedrive-hash-type=quickxor > hashsums.txt

and then filtered out all the lines starting with a blank.

It seems like the QuickXorHash is missing on all/most of the files that I have viewed/edited in Microsoft Office (Word, Excel,...) from my (native) OneDrive folder in the past 2-3 years - older files that I haven't viewed/touched do have a QuickXorHash. Other files (e.g. photos) also have a QuickXorHash.

I suspect this happens because Office some time ago (apparently) started opening files directly from the web when I open them from the local OneDrive folder, probably to allow for file collaboration - and in this situation the Office suite apparently doesn't trigger calculation of the QuickXorHash.

This can easily be reproduced by creating/modifying an Office file in the OneDrive web site (using the Word Online web app).

I am not sure how to best handle this situation. Perhaps we should postpone the change until Microsoft has implemented QuickXorHash for files edited by the Office apps and calculated QuickXorHash for all the Office files already in OneDrive Personal.

Does your contact know the timeline for that?

Are QuickXorHash calculated if you create a small helloWorld.docx from an OneDrive Business web site?
(I select New/WordDocument in the OneDrive web site, then select the file, then right click and select Open/OpenInWordOnline)

Good news!

Well spotted.

I looked through my onedrive and I found one file without a QuickXorHash - also a word document.

That has never been edited by the Office suite though (I don't use Windows) - I think I made it online though so probably the same root cause.

I have asked.

I created a simple word document on OneDrive Business and it did have a QuickXor hash. I did exactly the same on OneDrive Personal and it did not have a QuickXorHash.

Perhaps Microsoft want to move away from having hashes on docs (Google don't have hashes on their online docs) - this would give them more freedom to change the architecture.

I'd like to find out which way the wind is blowing for QuickXorHash on documents.

I'd like to merge at minimum the --onedrive-hash-type flag with the default to SHA1 for onedrive personal. That at least gives users the ability to cope with missing SHA1.

Would it make sense to add quickXorHash option to chunker remote to accommodate these changes for onedrive backend?

Hmm, chunker - hadn't thought about that.

If you are using chunker with SHA1 on a onedrive personal backend then this will affect you.

We could add quickxor support to chunker or you could follow this advice from the docs

If your storage backend does not support MD5 or SHA1 but you need consistent file hashing, configure chunker with md5all or sha1all . These two modes guarantee given hash for all files. If wrapped remote doesn't support it, chunker will then add metadata to all files, even small. However, this can double the amount of small files in storage and incur additional service charges.

yes you are right - exactly why I think quickXorHash support in chunker would make sense when OneDrive moves to it.

Agree, this is a good plan with the current knowledge. I doubt Microsoft can give a clear answer before 1.62 is due.

I split that commit into two - one to implement --onedrive-hash-type with the current defaults and one to change the defaults.

Here is both of them together

v1.62.0-beta.6754.d748b8ea7.fix-onedrive-hash on branch fix-onedrive-hash (uploaded in 15-30 mins)

If you want just the --onedrive-hash-type change and not the default change, you'll need to check out 312d440491c898c1725ff659871b1e18cfc206ee and build that.

I'll wait until I hear back from Microsoft before making the final decision as to merge 1 or both of the patches.

1 Like

Looks like Microsoft has updated their docs:

In Remarks:
" Note starting from July 2023 quickXorHash will be the only available property for both OneDrive for Business and OneDriver Personal. Everything else mentioned below is still valid until that date."

So it is kind of confirmed.

1 Like

I use chunker on onedrive and sha1all option but as at the moment onedrive supports SHA1 so there is no metadata files created for files smaller than chunk size. I know that in theory I could force it with massive chunk setup but I actually need chunking - it is union of multiple onedrive accounts where I also store files bigger than onedrive limit.

Now thinking what will happen in July - only my chunked files will have SHA1 hashes... rest will have XorHash. Which hash sync will use?

@kapitainsky well spotted. I think that is a solid argument for making the default QuickXor for 1.62

1 Like

I've merged this change to master now which means it will be in the latest beta in 15-30 minutes and released in v1.62

Now that Microsoft have publicly committed to a date for the changeover I think setting rclone to use QuickXor as the default for all OneDrive types is the right thing to do.

We have about 1 week until the 1.62 release so we can change our minds up until then :slight_smile:

1 Like

Why not just use the existing --hash-type?

There isn't a global flag --hash-type?

You're right. (of course!). I've been spending a lot of time in lsjson for a project so I had that on my mind.

With that said, would it make sense to have one? There are a few backends that support multiple (notably local and sftp).