OneDrive sync never completes with SHA-1 corrupted transfer with latest rclone

You can make this function early return to test that

I keep wondering whether rclone should be doing this at all though. Maybe it should be a flag?

@ncw I would (naively) expect the file to be visible in the OneDrive recycle bin (or in the version history).

Is it deleted before being "committed" - or am I missing something?

It is deleted after it is committed so should be in the recycle bin or version history if that is how you've got things configured.

@sweh Based on the above it seems like you will be able to find the "corrupted" files in your OneDrive recycle bin using the web interface - unless you have business account with a hard delete setting that I am unaware of.

You can probably narrow the search significantly because you know the exact time of deletion from your rclone log. I guess your filenames are encrypted, so you may need rclone cryptdecode to pinpoint the right file(s).

I'm not encrypting filenames for this remote 'cos I wanted it to be easy to restore files in case of an issue. So I can find the deleted object easily enough. But I need to be careful 'cos a restore could overwrite the (now correctly sync'd) good copy of the file.

OK, I now have a mess in my backup area :slight_smile:

$ rclone ls OneDriveBackups:penfold | grep 20220403._Fast.*ag
5368709120 20220403._FastData.0.gz_ag
5368709120 20220403._FastData.0.gz_ag (1)
5368709120 20220403._FastData.0.gz_ag (2)
5368709120 20220403._FastData.0.gz_ag 1

Let's copy a couple of them... Start with the known good one

$ rclone copy OneDriveBackups:penfold/20220403._FastData.0.gz_ag . 
$ sha1sum 20220403._FastData.0.gz_ag /BACKUP/penfold/20220403._FastData.0.gz_ag
3b6d0ccff3720afab3cf95a31109ab7617e1e79b  20220403._FastData.0.gz_ag
3b6d0ccff3720afab3cf95a31109ab7617e1e79b  /BACKUP/penfold/20220403._FastData.0.gz_ag

Good.

Let's try a "bad" one...

Well, that's interesting!

$ rclone copy OneDriveBackups:penfold/20220403._FastData.0.gz_ag\ 1 .
2022/04/04 15:04:54 ERROR : 20220403._FastData.0.gz_ag 1: Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:04:54 ERROR : Attempt 1/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:07:13 ERROR : 20220403._FastData.0.gz_ag 1: Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:07:13 ERROR : Attempt 2/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:08:26 ERROR : 20220403._FastData.0.gz_ag 1: Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:08:26 ERROR : Attempt 3/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:08:26 Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?

Let's try another of the bad files!

$ rclone copy OneDriveBackups:penfold/20220403._FastData.0.gz_ag\ \(1\) .
2022/04/04 15:12:14 ERROR : 20220403._FastData.0.gz_ag (1): Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:14 ERROR : Attempt 1/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:27 ERROR : 20220403._FastData.0.gz_ag (1): Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:27 ERROR : Attempt 2/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:44 ERROR : 20220403._FastData.0.gz_ag (1): Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:44 ERROR : Attempt 3/3 failed with 1 errors and: multipart copy: read failed: failed to authenticate decrypted block - bad password?
2022/04/04 15:12:44 Failed to copy: multipart copy: read failed: failed to authenticate decrypted block - bad password?

And the files are definitely not complete:

$ ls -l
total 8210492
-rw-r--r-- 1 sweh sweh 5368709120 Apr  3 03:09 20220403._FastData.0.gz_ag
-rw-r--r-- 1 sweh sweh 4223336448 Apr  4 15:12 20220403._FastData.0.gz_ag (1)
-rw-r--r-- 1 sweh sweh 4084137984 Apr  4 15:08 20220403._FastData.0.gz_ag 1

So the files are definitely corrupt in OneDrive!

Great testing - thank you.

So it looks like there is a corruption in the file such that crypt found a bad block. (The authenticator is very strong in crypt blocks.)

Can you download the encrypted file which should download OK (I conjecture its SHA1 will match the data), and then see if you can see what the corruption looks like? This will probably be quite difficult, but you have the offset of where it starts as it is the size of the file before it failed. Hopefully it will be ASCII text saying "Onedrive destroyed the data here", but any data which doesn't look completely random will be the corruption.

I guess there could just be data missing which will be impossible to spot.

So the question is, is there any way rclone could have caused this corruption, or are we sure it is a OneDrive bug?

Were these files uploaded straight through with no low level retries do we know?

$ rclone copy OneDriveSweharris2:Backups/penfold/20220403._FastData.0.gz_ag\ 1.bin .
$ rclone copy OneDriveSweharris2:Backups/penfold/20220403._FastData.0.gz_ag\ \(1\).bin .
$ rclone copy OneDriveSweharris2:Backups/penfold/20220403._FastData.0.gz_ag\ \(2\).bin .
$ sha1sum *
5008bfd43fa526566d221897d1d6bf7cbf2096f2  20220403._FastData.0.gz_ag (1).bin
c33bffe167ee08f8d85d621dce5d776387a41745  20220403._FastData.0.gz_ag (2).bin
1741774b18a839226697804c6b7e4186ac6ccbdb  20220403._FastData.0.gz_ag 1.bin

We can see these one match the errors

2022/04/03 13:24:55 ERROR : penfold/20220403._FastData.0.gz_ag: Failed to copy: corrupted on transfer: sha1 crypted hash differ "fe9c8b9ebaf512cac3a8ef74bcf5b5a5e28981ab" vs "5008bfd43fa526566d221897d1d6bf7cbf2096f2"

2022/04/03 12:45:19 ERROR : penfold/20220403._FastData.0.gz_ag: Failed to copy: corrupted on transfer: sha1 crypted hash differ "e43bee9ac8c35a64b7cc3c1ad62ac3014b12841c" vs "c33bffe167ee08f8d85d621dce5d776387a41745"

2022/04/03 13:35:59 ERROR : penfold/20220403._FastData.0.gz_ag: Failed to copy: corrupted on transfer: sha1 crypted hash differ "d976cd211051bbb71961716aa6786f98c6bfcd71" vs "1741774b18a839226697804c6b7e4186ac6ccbdb"

So Microsoft's hashes appear to be correct.

The files are all of the right size (including the "good" version for comparison).

-rw-r--r-- 1 sweh sweh 5370019872 Apr  3 03:09 20220403._FastData.0.gz_ag (1).bin
-rw-r--r-- 1 sweh sweh 5370019872 Apr  3 03:09 20220403._FastData.0.gz_ag (2).bin
-rw-r--r-- 1 sweh sweh 5370019872 Apr  3 03:09 20220403._FastData.0.gz_ag 1.bin
-rw-r--r-- 1 sweh sweh 5370019872 Apr  3 03:09 20220403._FastData.0.gz_ag.bin

So I ran strings on 20220403._FastData.0.gz_ag 1.bin and looked for any string longer than 30 characters long (grep .....). And... well...

C6jzLl8yC0Necw7InUZ9k9vBtR/nwXJd3BPxqpN1mBZ1vY88z2XEnyGn9kbcDxPdLXUIr9OwICiNcf3dWPiPZ1/klN/AvP8n28
....
....
....
ps7zhrsWCdht11NRF2gLHqunFhPL5Fwhm6kbKGwg39e24TZQ++q2xjVIf5pRdeX8FNt7KQceFNZ/79K+G  

That block is around 6934 bytes long all in one long line. It was at offset 248199391.

Similarly 20220403._FastData.0.gz_ag (1).bin had, at offset 16753810, a chunk of around 9961 characters, and 20220403._FastData.0.gz_ag (2).bin had, at offset 252537251, a chunk of 5860 bytes.

I have no idea if there were low level retries. But this looks like it might either have been doubly-encoded base64 by rclone, or else not decoded by OneDrive.

1 Like

That was very smart :slight_smile:

There is approximately 0 chance of that occuring in crypted data (well about 1 in 10^4174) so that is definitely a corruption.

It would be worth looking with a hex editor around that point to see if you can see any other clues - maybe there are some shorter strings like HTTP headers or something like that.

I think we ascertained above that we don't need low level retries to make this problem.

Rclone doesn't encode the data when sending to onedrive - it just sends plain binary data, so I'm struggling to understand where that base64 string comes from.

Onedrive uses big base64 blobs for tokens, and it also uses them for path names when uploading, hence my request to look around the corruption and see what you can see.

The good news is that onedrives sha1's are correct which means that rclone successfully detected a corruption in transfer and retried it.

Correct, based on the logs in this above post:

rclone_encrypted.log contains 2 corruptions prior to any low level retries. (encryption, --transfers=1, corrupted file is 134 MB)

rclone_unencrytped.log contains several corruptions, and some low level retries in between. The low level retries do not affect the corrupted files. (no encryption, --transfers=1, corrupted files are 134 MB)

@Malz may be able to recover the corrupted files from the OneDrive recycle bin, if needed.

(Doh; ignore the "offset" values I mentioned last time... they're the offsets into the strings output and not offset into the raw file. Doh! But the rest of the report is acccurate :-))

To make it manageable I split the 5Gb file into 1Mb chunks and found the chunk containing the corruption.

Unfortunately it doesn't look like there's anything interesting.

Here's a hex-dump of that area for the "1.bin" file (chunk 3419, so it starts 3419Mb into the main file).

000d3fd0  FF 98 6E 21 12 2A 50 56 CD 1A DC C8 04 A0 49 1C   ..n!.*PV......I.                      
000d3fe0  F6 CC F6 09 28 52 2B 1C 50 81 B5 B5 18 2B B5 EC   ....(R+.P....+..                      
000d3ff0  49 78 C8 E0 F3 DF BC 43 36 6A 7A 4C 6C 38 79 43   Ix.....C6jzLl8yC                      
000d4000  30 4E 65 63 77 37 49 6E 55 5A 39 6B 39 76 42 74   0Necw7InUZ9k9vBt                      
000d4010  52 2F 6E 77 58 4A 64 33 42 50 78 71 70 4E 31 6D   R/nwXJd3BPxqpN1m  
....
000d5af0  78 6A 56 49 66 35 70 52 64 65 58 38 46 4E 74 37   xjVIf5pRdeX8FNt7
000d5b00  4B 51 63 65 46 4E 5A 2F 37 39 4B 2B 47 B2 64 F0   KQceFNZ/79K+G.d.
000d5b10  A0 02 42 25 B4 7E 4D 45 6C A9 34 FF B5 AC CA 21   ..B%.~MEl.4....!
000d5b20  33 B8 51 C0 6B 99 C0 32 3E 9A 3D 49 82 CB B7 ED   3.Q.k..2>.=I....
000d5b30  3B 63 64 55 5B DF 1B 68 BD D2 5B EA 12 91 56 E7   ;cdU[..h..[...V.

There's no other readable string in that whole chunk (nor chunks 3418 or 3420).

1 Like

I'm assuming you broke the file into 1 MiB chunks ie each 1048576 bytes long... so chunk 3419 will start at 0x356c00

The corruption appears to start at 0xd3ff7 and end at 0xd5b0c inclusive so it is 6934 bytes long or 0x1b16. In absolute terms it starts at 0x42abf7 and ends at 0x42c70c

If this was an rclone corruption I'd expect it to be one of

  1. lined up with the crypt blocks which are 64KiB +16 = 0x10010 bytes long, starting at 0x20, That corruption is in the middle of a block, neither at the start or end.

  2. lined up with the upload block size which is 320KiB for onedrive. The corruption is about 1/3 of the way through a 320KiB block.

So nothing which points any fingers at rclone internals unless I got my maths wrong! I can't think where a pile of base64 data came from either.

I guess the next thing to do would be to either find a bug which seems relevant or report a bug here: Issues · OneDrive/onedrive-api-docs · GitHub - I've had reasonable success reporting bugs and Microsoft fixing them here. I didn't find a relevant one but there may be one somewhere!

If we are going to report a bug then we need a reasonably reliable way of reproducing, preferably as simple as possible.

What is the smallest file we've seen the corruptions on? 134,217,728 seems to be it? I've made a script to upload a file continuously of that size until it breaks - let's see if I can reproduce!

Here is the script if anyone else wants to have a go

#!/bin/bash

size=134217728
destination=TestOneDrive:thrashfiles/

for round in $(seq 100); do
    name="test-${round}-${RANDOM}${RANDOM}${RANDOM}.bin"
    echo
    echo --------------- $(date -Is) - round $round - $name ------------------
    echo

    dd if=/dev/urandom of=$name bs=1M count=$(($size/1048576+1))
    truncate -s $size $name
    sha1sum $name

    rclone --low-level-retries 1 --retries 1 -vv --dump responses copy "${name}" "${destination}"
    error=$?

    if [ $error -ne 0 ]; then
       echo "ERROR $error on $name"
    else
	rm $name
	rclone -v deletefile "${destination}${name}"
    fi
       
    sleep 5
done

I managed to reproduce the problem quite easily with that script - it only took 14 tries!

I shall compose a bug report and I'll post the URL here when I'm done.

2 Likes

Your maths were wrong :slight_smile:

dd if=20220403._FastData.0.gz_ag\ 1.bin bs=1 skip=3585949687 | hdump | head -10
00000000  43 36 6A 7A 4C 6C 38 79 43 30 4E 65 63 77 37 49   C6jzLl8yC0Necw7I
00000010  6E 55 5A 39 6B 39 76 42 74 52 2F 6E 77 58 4A 64   nUZ9k9vBtR/nwXJd
00000020  33 42 50 78 71 70 4E 31 6D 42 5A 31 76 59 38 38   3BPxqpN1mBZ1vY88
00000030  7A 32 58 45 6E 79 47 6E 39 6B 62 63 44 78 50 64   z2XEnyGn9kbcDxPd
00000040  4C 58 55 49 72 39 4F 77 49 43 69 4E 63 66 33 64   LXUIr9OwICiNcf3d
00000050  57 50 69 50 5A 31 2F 6B 6C 4E 2F 41 76 50 38 6E   WPiPZ1/klN/AvP8n
00000060  32 38 65 4A 4F 44 43 6E 48 45 69 4C 4F 4B 46 36   28eJODCnHEiLOKF6
00000070  5A 59 38 32 31 73 74 64 50 71 55 4C 4F 43 39 76   ZY821stdPqULOC9v
00000080  32 36 56 75 31 42 55 76 72 72 63 71 6F 4F 38 58   26Vu1BUvrrcqoO8X
00000090  31 48 52 4B 50 51 43 6A 6D 66 75 4A 51 76 52 74   1HRKPQCjmfuJQvRt

So the absolute offset was 3585949687 or 0xd5bd3ff7.

But I think the rest of the analysis was correct, anyway :slight_smile:

1 Like

Here is the issue I've made: Files are corrupted sometimes when uploaded with multipart uploads · Issue #1577 · OneDrive/onedrive-api-docs · GitHub

1 Like

Just wanted to say thank you to everyone looking into this issue.

It's well beyond my expertise at this point so I have nothing to add beyond the fact that I appreciate all the effort being put in here to diagnose a (likely) Microsoft bug.

2 Likes

(Firstly - apologies for the deleted posts above!)

I've also been seeing these types of errors when syncing about 15 x MP4 files from my Linux machine to OneDrive. Each file is between 2.5GB and 3.7GB in size.

I decided to try a few different ways of copying one of the files that had been failing with rsync.

The first attempt was by uploading the file via the OneDrive website (via a Windows 11 machine - where the sha1sum was checked and correct). This seemed to work, but when checking the sha1sum via rclone the files were different. I then downloaded the file back to my Windows 11 machine and the sha1sum was again incorrect (but the same as the rclone sha1sum).

I checked this suspect downloaded corrupt MP4 file with ffmpeg which did indeed find errors. Also, comparing the MP4 file with the original file using 'cmp' differences were found. Doing similar checks with hexdump as in a previous post, showed a definite difference in the 'look' of the data at the point indicated to by cmp.

I am now attempting to let the native Windows OneDrive program sync a copy of the file. I'm not seeing any errors, but so far it has not been successful, and it seems to be on its third attempt now.

So to me this definitely looks like a OneDrive issue, and somewhat concerning that I managed to upload a file to OneDrive that seemed to succeed but was in fact corrupt.

Hopefully Microsoft will come back to you on the bug that was raised with them.

If I'm understanding you correctly then you managed to upload a file via the OneDrive website (and not via rclone) which turned out to be corrupted.

If that is the case that is important confirmation that this isn't an rclone bug.

I will try this myself. Unfortunately my upload bandwidth is very poor so this will take some time!

if you have a specific command, i can run it for you.

Yes - that is correct. Uploaded via the OneDrive website - not via rclone.

1 Like

OK here is something that you could try

First run this to create some test files. Feel free to create more or larger files.

rclone test makefiles --files 100 --min-file-size 128M --max-file-size 256M /path/to/testdir

Then upload /path/to/testdir folder to onedrive using the web interface

Then use rclone check /path/to/testdir onedrive:testdir to see if you got any corruptions. If not repeat!