Multipart upload to B2 sometimes failing with corrupted on transfer: sizes differ

What is the problem you are having with rclone?

Copying a tree of 35 files with rclone copy from local disk to Backblaze B2, I see frequent corrupted on transfer: sizes differ errors with multipart file uploads. In each case, rclone reports that the destination file size is zero, e.g.:

2023/09/18 17:20:13 ERROR : catalog_sales/20221122_161347_00795_djagr_3a042953-d0a2-4b8d-8c4e-6a88df245253: corrupted on transfer: sizes differ 244695498 vs 0

Six files in the tree are big enough for multipart uploads; three transfer correctly, three fail with corrupted on transfer: sizes differ. The same files succeed and fail through the two subsequent retries.

Digging deeper: helpfully, Rclone just hides the files in B2 on failure, so I can download them. They all match the source files. Further, if I use --ignore-size, the copy succeeds with no errors, and a subsequent rclone check ... --download shows that all of the files in the destination are identical to those in the source.

So, is the b2_finish_large_file API erroneously reporting zero file size? I ran with --dump headers,responses and the b2_finish_large_file response contains the expected ContentLength:

2023/09/18 17:20:13 DEBUG : catalog_sales/20221122_161347_00795_djagr_3a042953-d0a2-4b8d-8c4e-6a88df245253: Finishing large file upload with 3 parts
2023/09/18 17:20:13 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2023/09/18 17:20:13 DEBUG : HTTP REQUEST (req 0xc000c74c00)
2023/09/18 17:20:13 DEBUG : POST /b2api/v1/b2_finish_large_file HTTP/1.1
Host: api004.backblazeb2.com
User-Agent: rclone/v1.64.0
Content-Length: 259
Authorization: XXXX
Content-Type: application/json
Accept-Encoding: gzip

2023/09/18 17:20:13 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2023/09/18 17:20:13 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2023/09/18 17:20:13 DEBUG : HTTP RESPONSE (req 0xc000c74c00)
2023/09/18 17:20:13 DEBUG : HTTP/1.1 200 
Content-Length: 854
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=UTF-8
Date: Mon, 18 Sep 2023 17:20:13 GMT

{
  "accountId": "15f935cf4dcb",
  "action": "upload",
  "bucketId": "91a5cf5973f5dc4f848d0c1b",
  "contentLength": 244695498,
  "contentMd5": null,
  "contentSha1": "none",
  "contentType": "application/octet-stream",
  "fileId": "4_z91a5cf5973f5dc4f848d0c1b_f221926aaeb9f362c_d20230918_m172005_c004_v0402008_t0028_u01695057605512",
  "fileInfo": {
    "src_last_modified_millis": "1669152496000",
    "large_file_sha1": "af2db424fe5092a7933e1ca128c0613a1953342f"
  },
  "fileName": "tpcds-benchmark/catalog_sales/20221122_161347_00795_djagr_3a042953-d0a2-4b8d-8c4e-6a88df245253",
  "fileRetention": {
    "isClientAuthorizedToRead": false,
    "value": null
  },
  "legalHold": {
    "isClientAuthorizedToRead": false,
    "value": null
  },
  "serverSideEncryption": {
    "algorithm": null,
    "mode": null
  },
  "uploadTimestamp": 1695057605512
}
2023/09/18 17:20:13 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2023/09/18 17:20:13 ERROR : catalog_sales/20221122_161347_00795_djagr_3a042953-d0a2-4b8d-8c4e-6a88df245253: corrupted on transfer: sizes differ 244695498 vs 0

It seems like, somehow, that size is being zero'd out sometimes.

Run the command 'rclone version' and share the full output of the command.

# rclone version
rclone v1.64.0
- os/version: debian 11.7 (64 bit)
- os/kernel: 5.10.0-23-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.1
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Backblaze B2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --no-check-dest copy tpcds-benchmark b2remote:metadaddy-rclone/tpcds-benchmark

I added --dump headers,responses -vv during my investigation.

The rclone config contents with secrets removed.

{
    "b2crypt": {
        "password": "*****",
        "remote": "b2remote:metadaddy-rclone",
        "type": "crypt"
    },
    "b2remote": {
        "account": "*****",
        "key": "*****",
        "type": "b2"
    },
    "s3remote": {
        "access_key_id": "*****",
        "location_constraint": "us-west-1",
        "provider": "AWS",
        "region": "us-west-1",
        "secret_access_key": "*****",
        "type": "s3"
    }
}

A log from the command with the -vv flag

This code did get re-written for v1.64.0

I stared at the code and I see the problem. In fact I'm surprised it ever worked! I suspect it worked because sometimes the b2 listings were eventually consistent and sometimes they weren't.

Anyway please give this a go.

v1.65.0-beta.7378.6072d314e.fix-b2-multipart on branch fix-b2-multipart (uploaded in 15-30 mins)

If that fixes the problem then I'll put it in a v1.64.1

PS I think this should work in v1.63.1 also.

In fact I'm surprised it ever worked!

Yep - I've been there!

If that fixes the problem

It does - my copy succeeds with this version with no need for --ignore-size.

then I'll put it in a v1.64.1

Cool - I'm writing an article for the Backblaze blog right now about the awesome work on multithreaded transfers in 1.64.0. I'll include a note that I had to use --ignore-size to successfully copy a file tree to B2, and that it should be fixed in 1.64.1. Thanks!

While I'm here, @ncw, is this an accurate summary of the changes to multithreaded file transfers?

Something of a Swiss Army Knife for cloud storage, Rclone can copy files, synchronize directories, and even mount remote storage as a local filesystem. Previous versions of Rclone were able to take advantage of multithreading to accelerate the transfer of ‘large’, by default at least 256 MB, files, but the benefits were limited. When transferring files from, say, Amazon S3 to Backblaze B2, Rclone would start multiple threads to simultaneously read chunks of the file from Amazon S3 into memory, and start another set of threads to simultaneously write that data to Backblaze B2. Data transfer speed was limited by the need to sync up ‘producer’ threads downloading data into memory with ‘consumer’ threads uploading the data.

Rclone v1.64.0 completely refactors multithreaded transfers. Now Rclone starts a single set of threads, each of which both reads a chunk of data from the source service into memory, and then writes that chunk to the destination service, iterating through a subset of chunks until the transfer is complete. The threads transfer their chunks of data in parallel, and each transfer is independent of the others. This architecture is both simpler and much, much faster.

Thanks for testing :slight_smile:

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.64.1

Previously what would happen is that the read of data from the source (S3 in this example) would be done in a single threaded sequential way. Each chunk read from the source was then passed off to another thread to be written in parallel. So if the source was about the same speed as the destination (as is likely when reading/writing over the network) the destination transfer never got the benefit of being parallel as only 1 thread was running at once. When reading from disk (the common case) which is much faster than network this does give a nice benefit of parallel transfers.

That is 100% accurate :slight_smile:

This 6072d314e1e7893ac8ace57477924bc0ee48c2fb commit seems to cause hangs.
1.63.1 works.

2023/09/20 13:39:26 INFO  : 
Transferred:      106.381 MiB / 1.577 GiB, 7%, 0 B/s, ETA -
Errors:                 1 (retrying may help)
Checks:             34702 / 34702, 100%
Transferred:            3 / 4, 75%
Elapsed time:     26m30.6s
Transferring:
 * UUOEZZeDzV13DuJJdQZTcw…1OiTwE-ZsWgHzBF9r7IG9n:  5% /1.566Gi, 0/s, 2562047h47m16s

Please do a run with -vv and post the log - thanks.

Previously what would happen...

Thanks, @ncw - I corrected the blog entry. It should go out tomorrow. I'll post a link here :+1:

2 Likes

Well, I don't know how to reproduce it anymore. I open a new bug report if this happens again... :man_shrugging:

1 Like

We published the blog post on the v1.64.0 performance improvements - I'll update it after v1.64.1 comes out :+1:

2 Likes

Just a heads up: both 1.64.0 and the (now) latest beta make rclone (crypt + sync) exit with return code 0 mid-transfer to B2-EU of large files (> 4GB). Not a single message is logged (at log level: INFO) about the cause of the exit. As I've not been able to finish a large file sync, even after multiple retries and multiple changes in settings, I reverted to 1.63.1. Will post a report with the beta and -vv (where? github or here?) when I can free up some time to do so.

@hb020 if you could make a log with -vv and make a new support topic here, filling in the details that would be a good start! You'll probably need to put the log somewhere else (eg pastebin) and post a link.

I don't know if this is a problem with b2eu though rather than rclone - I was trying to transfer files myself there this morning and I was getting continuous

no tomes available (503 service_unavailable)

Errors (you'll only see this with -vv)

See 1.64.0 and latest beta 1.65.0 both crash on rclone crypt/sync B2 on large files

upload of chunk 17/17 never starts.

2023/09/25 10:26:49 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Need to transfer - File not found at Destination
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: using backend concurrency of 16 instead of --multi-thread-streams 4
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Starting multi-thread copy with 17 chunks of size 96Mi with 16 parallel streams
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 16/17 (1509949440-1610612736) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 1/17 (0-100663296) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 8/17 (704643072-805306368) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 6/17 (503316480-603979776) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 11/17 (1006632960-1107296256) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 7/17 (603979776-704643072) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 5/17 (402653184-503316480) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 13/17 (1207959552-1308622848) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 14/17 (1308622848-1409286144) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 4/17 (301989888-402653184) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 3/17 (201326592-301989888) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 2/17 (100663296-201326592) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 12/17 (1107296256-1207959552) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 15/17 (1409286144-1509949440) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 10/17 (905969664-1006632960) size 96Mi starting
2023/09/25 10:26:53 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: multi-thread copy: chunk 9/17 (805306368-905969664) size 96Mi starting
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 11 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 14 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 12 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 15 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 10 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 2 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 13 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 0 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 9 length 100663296
2023/09/25 10:26:54 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 8 length 100663296
2023/09/25 10:26:55 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 5 length 100663296
2023/09/25 10:26:56 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 6 length 100663296
2023/09/25 10:26:56 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 3 length 100663296
2023/09/25 10:26:56 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 7 length 100663296
2023/09/25 10:27:01 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 1 length 100663296
2023/09/25 10:27:08 DEBUG : UUOEZZeDzV13DuJJdQZTcw/woZ4rO58ymNAAF_lQ9DVMmHbvJXuNFQVkg583h1kxlLVlQDf4BfBL16opluAIlW9: Sending chunk 4 length 100663296
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 3.580Mi/s, 7m1s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 528.894Ki/s, 48m40s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 76.297Ki/s, 5h37m24s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 11.006Ki/s, 38h58m56s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 1.587Ki/s, 270h13m40s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 234/s, 1873h13m57s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 33/s, 12985h20m33s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 4/s, 90015h2m53s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 623988h50m47s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s
 * UUOEZZeDzV13DuJJdQZTcw…LVlQDf4BfBL16opluAIlW9:  5% /1.567Gi, 0/s, 2562047h47m16s

I found a deadlock in the code (perils of coding with lots of threads).

Try this (this includes the earlier fix for b2)

v1.65.0-beta.7391.55c3c221b.fix-b2-upload-url-lock on branch fix-b2-upload-url-lock (uploaded in 15-30 mins)

Thanks, it fixed this particular hang!

I've merged all the b2 fixes to master now which means it will be in the latest beta in 15-30 minutes and released in v1.64.1

Thanks for testing!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.