OneDrive sync never completes with SHA-1 corrupted transfer with latest rclone

Thanks for the information.

The backup that ran last night completed with zero errors. That's the first time it's done that in weeks.

2 Likes

Perfect, I am sure Brad/Microsoft (ificator at GitHub) would be happy if you also share this on the Github issue.

1 Like

Hi there, I am still experiencing similar problems. Could you let me know how to fix the problems? Thanks.

rclone copy /data/WtyCuWAw/ onedrive:/backups/ -P

rclone v1.58.0

  • os/version: debian 9.13
  • os/kernel: 4.14.150-odroidxu4 (armv7l)
  • os/type: linux
  • os/arch: arm
  • go/version: go1.17.8
  • go/linking: static
  • go/tags: none

Hi klemperer, welcome to the forum!

The issue discussed in this thread was solved by Microsoft on April 7th, so I doubt you are seeing the same issue, but let’s check to be sure. To do so we need the additional information mentioned in this guide:

Thanks for your reply but I still have two more questions:

  1. As you said the issue was solved several days ago but I was seeing the same error today, also not all files were corrupted during transfer. So, is this issue happen randomly by accounts and by files?

  2. According to your instruction, how can I provide the information to you? The original file might be huge.

Thanks again.

As the corrupted file was removed by rclone, the second step "sha1sum" outputs as follows: "Failed to sha1sum with 2 errors: last error was: directory not found". Is it the expected information?

Yes, it happened randomly across accounts and files.

Just post the requested information in this thread. There is no need to share the original file, we just need to know that you have access to both the original file (on your local drive) and failed file (in your OneDrive recycle bin), then we can instruct you to do the necessary compares.

No, I suspect you tried something like this:

rclone sha1sum onedrive:backups/somefolder/somefile --download

where I was trying to get you to do something like this:

rclone sha1sum /data/WtyCuWAw/somefolder/somefile --download

where somefolder/somefile should be replaced with the information seen in your rclone output/log.

If this fails too, then please post the requested extract from the log and the exact command you tried to execute.

Step 1:

2022/04/14 12:55:28 ERROR : bWYwkKxuiAgQTublEY8oUA/yRy59yFoJ2i70ix9GDO1nfahCR4Bs4CZTte6pCs0k0E: corrupted on transfer: sha1 hash differ "7685fbdb6a7162762aacef16e3d37f17898711a9" vs "7e15d26345df72a529cbb6921f962742b47148b6"
2022/04/14 12:55:28 INFO : bWYwkKxuiAgQTublEY8oUA/yRy59yFoJ2i70ix9GDO1nfahCR4Bs4CZTte6pCs0k0E: Removing failed copy

Step2 (the sha1 digest of the original file right?):

rclone sha1sum /data/WtyCuWAw//bWYwkKxuiAgQTublEY8oUA/yRy59yFoJ2i70ix9GDO1nfahCR4Bs4CZTte6pCs0k0E --download
7685fbdb6a7162762aacef16e3d37f17898711a9 yRy59yFoJ2i70ix9GDO1nfahCR4Bs4CZTte6pCs0k0E

Step3: The corrupted file in the Onedrive recycle bin was confirmed.

Thanks.

Perfect, I agree it does look like the issue where OneDrive randomly corrupted uploaded files. To be sure we need to check if the corrupted file contains the characteristic injection of base64 data.

To do so you need to recover the corrupted file from the recycle bin, move/rename it to something easily recognizable/safe, and then download it to your local disk. I guess you know the best steps in your situation.

@ncw I have limited experience with binary compares on linux. Can you help us with some easy commands to find/document the base64 injection in the corrupted file?

@klemperer How often do you see the issue? How many files/MB do you typically need to upload before seeing the issue? (I am asking to understand how easy/difficult it is for you to reproduce; I just tried 200 files/40GB myself without seeing it)

@ncw A related question to improve my understanding: Is the source checksum in the rclone log calculated before, during or after upload? (My guess is after based on a quick peak in the code)

The easiest way to see the base64 data (assuming the file is a binary file) is to do

strings -n 64 /path/to/file

Compare it to doing on the original file an you should see a big blob of base64 data (which looks like numbers and lower and upper case letters).

Which will print all strings of > 64 characters of ASCII.

If you want to binary diff the two files then do

diff -u <(hd </path/to/ok-file) <(hd </path/to/corrupted-file)

This will take a long time!

Its normally calculated as the file is being read from the disk in the local backend. Rclone works out which hash will be needed here:

And passes it into the Open call to hint to the local backend that we'd like that hash

However if the object isn't read sequentially all the way for some reason then the hash won't get cached and rclone will just read the object off disk again.

Some backends (not onedrive) need the hash in advance of sending the data so rclone has to read the file twice in that case.

1 Like

This issue happened almost one in four files transferred, maybe more.

I just recovered and downloaded one of the corrupted file, then compared it to the orignal one, and found some interesting result, as follows:


So the file was definitely corrupted during the transfer. End of the difference part, there is some text like "subVersion". Subversion is NOT installed in my computer. Hope it could help to solve this issue.

Thanks.

OK, that is very frequent!

I would be very helpful to Microsoft if you can collect the responses received from OneDrive with a command something like this:

rclone copy /data/WtyCuWAw/ onedrive:/backups/ -P --log-file yourlogfile.txt -vv --dump responses

The log file quickly becomes very large, so you may want to try reproducing on just a few files in a subfolder.

This very detailed logging may (very well) contain sensitive data (e.g. a refreshed oauth2 token), so don’t just post it; I suggest we ask Microsoft exactly what they are looking for or Nick what to filter out.

Very nice comparisons and interesting observation, we will certainly look into that.

That looks like a part of JSON blob.

...
Bl5m+0w=="}],
"subVersion":3,
"trigger":"timer"}

I don't recognise the JSON - it doesn't look like it is part of the Onedrive API - nowhere on the API site is the string subVersion, so I suspect it is an internal part of onedrive's working.

In fact it looks like the base64 encoded string is part of that JSON blob and that makes sense as base64 is often used to encode binary data in JSON.

So it looks like for you the problem isn't fixed. I haven't been able to reproduce it recently.

I think probably using --dump headers rather than --dump responses will be enough for Microsoft and that won't have any sensitive data in and will also make the log a lot shorter.

Fully agree and improved to only perform one retry/attempt the proposed command becomes:

rclone copy /data/WtyCuWAw/ onedrive:/backups/ -P --log-file yourlogfile.txt -vv --dump headers --retries 1

@klemperer The log file could still become very large, so you may want to try reproducing on just a few files in a subfolder.

That looks perfect @klemperer - I checked the log and I don't think it has anything confidential in. Can you add a comment to #1577 and attach the log with a note of the line number of the problem (you should be able to attach a .log file) - I think Microsoft will be very interested that you still see the problem.

I tried adding one more option to the rclone copy command: "--onedrive-chunk-size 1280k", and no error happened, hundreds of files successfully transfered till now.

And I then created 60 random files with average size of 200MB, uploaded to onedrive through web (Microsoft Edge) and through rclone (without the onedrive-chunk-size option). The uploaded files are identical to the local ones (rclone check).

So, maybe the issue was fixed (silently) by Microsoft?

2 Likes

I suspect the fix has fully rolled out now - onedrive is complicated - has lots of regions and different people providing different parts of the service so I'm not surprised it takes a while to roll out a fix. Let's hope that it is all fixed now.

I suspect so too, thanks a lot for all your tests and updates!

1 Like