Errors and failure with big file upload to B2

What is the problem you are having with rclone?

Cannot upload big file to B2, when on a fast 2.5 Gbit connection.

The upload process is super weird, the file is 66 GB, but it transfers to 66 GB first, with some errors, then it transfers to 69 GB, then it starts transferring to 134 GB, then it gets 199 GB, then it finally terminates with error. There are errors in between as well.

Server is a beefy 32 CPU server with 128 GB ram, 2.5 Gbit connection, nothing else is running.

I've never seen anything like this happening with rclone before.

Run the command 'rclone version' and share the full output of the command.

rclone v1.64.2

  • os/version: ubuntu 22.04 (64 bit)
  • os/kernel: 5.15.0-87-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.21.3
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

B2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone -vv --multi-thread-streams=32 --stats=10s 
copyto a.gz b2:b/c.gz --log-file=rclone.log

The rclone config contents with secrets removed.

[b2]
type = b2
account = x
key = x

A log from the command with the -vv flag

Contacted B2 support with this as well. I believe the errors:

  • 500 internal_error
  • 503 service_unavailable
  • connection reset by peer

are definitely not rclone's fault. But rclone should still handle this situation in a reliable way (in my opinion, since rclone is such a super bullet-proof solution).

There was a bug/fix with 1.64 that is probably still around with some retries I'd imagine as the chunks were retried but something still went wrong.

I'd imagine @ncw has to take a peek.

I've found the file uploaded with the wrong filename in B2 web interface. It's been uploaded 4(!) times. The SHA1 is correct, so the upload is kind of there.

About the wrong filename, isn't the following command supposed to upload it as planetiler_2023_oct.mbtiles.gz?

["rclone" "-vv" "--multi-thread-streams=32" "--stats=10s" "copyto" "output.mbtiles.gz" "b2:osm2vectortiles/planetiler_2023_oct.mbtiles.gz" "--log-file=rclone.log"]

Web interface:

To make life beautiful, I cannot replicate it on a normal 1 Gbit server, it needs to have a fast connection.

The following also reproduces the multiple upload bug on a 1 Gbit server. Server is a fresh Hetzner "cloud" Ubuntu 22 host.

sudo -v ; curl https://rclone.org/install.sh | sudo bash

apt-get update && apt-get install nload

dd if=/dev/urandom of=test1.bin bs=1G count=1
# dd if=/dev/urandom of=test60.bin bs=1G count=60

export RCLONE_CONFIG_X_TYPE=b2
export RCLONE_CONFIG_X_ACCOUNT=aaa
export RCLONE_CONFIG_X_KEY=bbb

rclone \
    -vv \
    --multi-thread-streams=32 \
    --stats=10s \
    copyto test1.bin x:bucket/dest1.bin

Things appeared to go quite badly wrong at b2 here

2023/10/25 10:33:05 DEBUG : output.mbtiles.gz: Error sending chunk 12 (retry=true): incident id 6bb4db9e3aa4-562f63bd000c85d5 (500 internal_error): &api.Error{Status:500, Code:"internal_error", Message:"incident id 6bb4db9e3aa4-562f63bd000c85d5"}
2023/10/25 10:33:05 DEBUG : output.mbtiles.gz: Clearing part upload URL because of error: incident id 6bb4db9e3aa4-562f63bd000c85d5 (500 internal_error)

"internal error" with "incident id" doesn't sound good!

Have you tried asking b2 about this?

Ultimately this failed because we got this back from B2 when we attempted to finalise the upload

2023/10/25 10:40:25 DEBUG : output.mbtiles.gz: Finishing large file upload with 698 parts
2023/10/25 10:40:27 DEBUG : Couldn't decode error response: EOF
2023/10/25 10:40:27 DEBUG : output.mbtiles.gz: multi-thread copy: cancelling transfer on exit

That did not appear to retry though...

Any chance you could do that again with --retries 1 and -vv --dump responses and that should clear up what is going on.

Yes, waiting for their reply.

Ok, here is the first step in debugging. This is on a normal VPS, 1 Gbit connection. The snippet is this: Errors and failure with big file upload to B2 - #6 by hyperknot

This doesn't produce any B2 incident_id error, but breaks 100%.

dd if=/dev/urandom of=test4.bin bs=1G count=4

rclone \
    -vv \
    --retries 1 \
    --dump responses \
    --multi-thread-streams=32 \
    --stats=10s \
    copyto test4.bin x:osm2vectortiles/dest4.bin \
    --log-file=rclone.log

The file is actually uploaded, but it's still called test4.bin, not dest4.bin!

Log:

Backblaze responded to me quickly and professionally.

Taking a look at the log file provided, it appears along with the 500 and 503 error messages, there was a no tomes available error. The message here "no tomes available" means that the current server the files are being uploaded into has become full. We do not have a way around this, it just depends on the load of the current server. Though, rClone should request a new upload url on its own and retry uploading the backup again, and it should complete the second time as it will get a new tome to upload files into.
(error no tomes available (503 service_unavailable))

Later they confirmed that this is normal that this happen on a 2.5 Gbit connection but not on a 1 Gbit one. They believe rclone needs to handle this situation, as it's a normal behaviour on the server. The incident ids were all normal.

Thank you for the log.

I can see very clearly what is going on now.

What is happening is that the file gets uploaded fine and b2 acknowleges that.

2023/10/25 18:30:02 DEBUG : HTTP REQUEST (req 0xc0f7774100)
2023/10/25 18:30:02 DEBUG : POST /b2api/v1/b2_finish_large_file HTTP/1.1
Host: api000.backblazeb2.com
User-Agent: rclone/v1.64.2
Content-Length: 1979
Authorization: XXXX
Content-Type: application/json
Accept-Encoding: gzip

Response is all good

2023/10/25 18:30:02 DEBUG : HTTP RESPONSE (req 0xc0f7774100)
2023/10/25 18:30:02 DEBUG : HTTP/1.1 200 
Content-Length: 770
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=UTF-8
Date: Wed, 25 Oct 2023 18:30:01 GMT

...

However rclone then HEADs the file to double check it is OK and at this point it doesn't exist.

2023/10/25 18:30:02 DEBUG : HTTP REQUEST (req 0xc00022ab00)
2023/10/25 18:30:02 DEBUG : HEAD /file/osm2vectortiles/dest4.bin HTTP/1.1
Host: f000.backblazeb2.com
User-Agent: rclone/v1.64.2
Authorization: XXXX

File does not exist

2023/10/25 18:30:02 DEBUG : HTTP RESPONSE (req 0xc00022ab00)
2023/10/25 18:30:02 DEBUG : HTTP/1.1 404 
Content-Length: 94
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=utf-8
Date: Wed, 25 Oct 2023 18:30:02 GMT

Hmm, looks like rclone did upload the file as test4.bin but is expecting to find a file called dest4.bin which is why it didn't work.

I can replicate this with a 6M file and

rclone copyto -vv /tmp/6M b2:test-isos/6M.copy --b2-upload-cutoff 0 --b2-chunk-size 5M --multi-thread-cutoff 1M --retries 1 --dump responses 2>&1 | tee rclone-b2.log

I have had a go at fixing this - please try

v1.65.0-beta.7459.1c8e72e8d.fix-multithread-copyto on branch fix-multithread-copyto (uploaded in 15-30 mins)

Rclone deals with getting a new upload URL - that code is quite well battle tested and I think works OK.

Can you try the new code on your 1 Gbit/2.5 Gbit servers. I can't see how it could fix a problem like that unless you maybe used copy on one and copyto on the other?

Thanks for the build. I tested it on the 1 Gbit server first, here is the log.

The only error I see is the DEBUG : Couldn't decode error response: EOF at the very beginning, is that OK?

I'll test it on the big server if this is OK.

Yes that error is fine. I think I need to suppress it as it is expected which makes the log confusing!

Sorry the last one was with the official build, I'll redo it.

1 Like

OK, here is the dev build on the fast server.

Here is one where the file already existed. Seems to work fine. Does it look OK?

Yes this all looks good now - thank you for testing.

I just need to write some integration tests for this to make sure it doesn't happen again then I'll merge the fix.

Thanks a lot for the quick fix.

1 Like

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.65, I'll put it in v1.64.3 if we make one of those.

1 Like