Copying Files Within a B2 Bucket

Hello Nick,

I am re-running the Copy command with the --tps-limit 10 switch on the large 34TB folder and it will take some time to complete. It did transfer about 36TB the first time I ran the Sync command and currently it has transferred another 2TB running in the current copy mode. It has dome more checks than transfers but it is still transferring quite a bit of data. When I check the size of the Backblaze bucket that these folders exist in, the total size is now nearing 90TB. The original data set was 34TB and if it was copied I could see it becoming about 70TB, where is the extra 20TB of data coming from?

like this one, the small 3TB folder that I copied seems to be in the same boat. That one finished and I have everything working well, I went in and deleted the original 3TB folder but the new folder where everything was copied into is now 13TB in size. An extra 10TB seems like a pretty big difference! Is there anything I can do to bring the file size down while insuring I don't loose necessary data?

All best...

Jason

They are almost certainly either old versions, or perhaps more likely unfinished large file uploads. You can cancel these on b2's website though I think they automatically disappear after a while.

I should probably put this into rclone cleanup for b2.

Hi @ncw, apologies for again following up on this topic. I also ran into this problem when trying to move a large file (30GB) within a bucket on B2 using rclone. I believe your fix will be released in the next rclone version, is that correct?
Also, just to see what would happen, I ran the same command but this time letting rclone use B2's S3 interface. I get a similar error:

2020/07/21 15:18:00 ERROR : Attempt 1/3 failed with 1 errors and: EntityTooLarge: Copy source too big: 5368709120

Is your fix for the normal b2 API as well as the s3 interface?

That is correct - the fix for that is in the latest beta.

Hmm...

The S3 interface does implement server side copy and has since before 1.51

Rclone defines this

    maxSizeForCopy      = 5 * 1024 * 1024 * 1024 // The maximum size of object we can COPY

Which is 5,368,709,120 bytes. According to the error message your file is 5,368,709,120 bytes too! so exactly 5 GiB. Is that the size of your object?

Here are amazon's docs

You create a copy of your object up to 5 GB in size in a single atomic operation using this API. However, to copy an object greater than 5 GB, you must use the multipart upload Upload Part - Copy API.

I read that as being a 5 GiB object can be copied.

Maybe this is an off by one error in b2 or in rclone?

What happens if you try setting --s3-copy-cutoff 1G does that work?

Ah! What version of rclone are you using?

@ncw
rclone v1.51.0

  • os/arch: windows/amd64
  • go version: go1.13.7

hello,
you should update to latest v1.52.2

@asdffdsa that will solve the file size error when doing a server side copy using B2's S3 interface?

sorry, you need to use the beta version

@asdffdsa Unfortunately it does not solve the file too large error using B2's S3 interface

about this,


i could be wrong, but the beta seems to have fixed the problem

can you post

  • rclone version
  • the rclone command
  • debug log

Ok, first with the B2 S3 interface. The source folder contains one 30GB file.

rclone version:
rclone v1.52.2-232-gff843516-beta

  • os/arch: linux/amd64
  • go version: go1.14.6

rclone command:
rclone copy :s3:bucket-name/folder1 :s3:bucket-name/folder2 -vv --stats 2s --stats-one-line --transfers 14 --fast-list --s3-access-key-id xyz --s3-secret-access-key xyz --s3-endpoint https://s3.eu-central-xyz.backblazeb2.com --s3-region eu-central-xyz --s3-provider Other

rclone output:
2020/07/22 16:45:54 DEBUG : Using config file from "/xyz/.config/rclone/rclone.conf"
2020/07/22 16:45:56 DEBUG : S3 bucket bucket-name path folder2: Waiting for checks to finish
2020/07/22 16:45:56 DEBUG : S3 bucket bucket-name path folder2: Waiting for transfers to finish
2020/07/22 16:45:57 DEBUG : Cancelling multipart copy
2020/07/22 16:45:57 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -

2020/07/22 16:45:57 ERROR : 30GBFile.bin: Failed to copy: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:45:57 INFO : There was nothing to transfer
2020/07/22 16:45:57 ERROR : Attempt 1/3 failed with 1 errors and: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:45:58 DEBUG : S3 bucket bucket-name path folder2: Waiting for checks to finish
2020/07/22 16:45:58 DEBUG : S3 bucket bucket-name path folder2: Waiting for transfers to finish
2020/07/22 16:45:59 DEBUG : Cancelling multipart copy
2020/07/22 16:45:59 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -

2020/07/22 16:45:59 ERROR : 30GBFile.bin: Failed to copy: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:45:59 INFO : There was nothing to transfer
2020/07/22 16:45:59 ERROR : Attempt 2/3 failed with 1 errors and: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:46:00 DEBUG : S3 bucket bucket-name path folder2: Waiting for checks to finish
2020/07/22 16:46:00 DEBUG : S3 bucket bucket-name path folder2: Waiting for transfers to finish
2020/07/22 16:46:01 DEBUG : Cancelling multipart copy
2020/07/22 16:46:01 ERROR : 30GBFile.bin: Failed to copy: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:46:01 INFO : There was nothing to transfer
2020/07/22 16:46:01 ERROR : Attempt 3/3 failed with 1 errors and: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz
2020/07/22 16:46:01 INFO : 0 / 0 Bytes, -, 0 Bytes/s, ETA -

2020/07/22 16:46:01 DEBUG : 6 go routines active
2020/07/22 16:46:01 Failed to copy: EntityTooLarge: Copy source too big: 5368709120
status code: 400, request id: xyz, host id: xyz

that beta is not the one mentioned in the post.
perhaps try that beta.

also, there is a new flag mentioned
This adds a new flag --b2-copy-cutoff

That is very interesting. Looks like there is a problem there. Can you do a log with -vv --dump bodies please?

Can you try the b2 interface too?

I also tried the same with the normal B2 API interface. This actually works, but note the 0% progress updates.

rclone version:
rclone v1.52.2-232-gff843516-beta

  • os/arch: linux/amd64
  • go version: go1.14.6

rclone command:
rclone copy :b2:bucket-name/folder1 :b2:bucket-name/folder2 -vv --stats 2s --stats-one-line --transfers 14 --fast-list --b2-account xyz --b2-key xyz

rclone output:
2020/07/22 16:51:28 DEBUG : Using config file from "/xyz/.config/rclone/rclone.conf"
2020/07/22 16:51:34 DEBUG : B2 bucket bucket-name path folder2: Waiting for checks to finish
2020/07/22 16:51:34 DEBUG : B2 bucket bucket-name path folder2: Waiting for transfers to finish
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Starting copy of large file in 8 chunks (id "4_z4dec9c3fa207000e7436001c_f20202287984e8c7d_d20200722_m145134_c003_v0312001_t0006")
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 2 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 8 length 2147483648
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 6 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 7 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 1 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 4 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 5 length 4294967296
2020/07/22 16:51:35 DEBUG : 30GBFile.bin: Copying chunk 3 length 4294967296
2020/07/22 16:51:35 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:52:57 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:52:59 DEBUG : 30GBFile.bin: Done copying chunk 3
2020/07/22 16:52:59 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:53:11 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:53:12 DEBUG : 30GBFile.bin: Done copying chunk 4
2020/07/22 16:53:13 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:53:19 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:53:20 DEBUG : 30GBFile.bin: Done copying chunk 5
2020/07/22 16:53:21 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:53:29 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:53:30 DEBUG : 30GBFile.bin: Done copying chunk 2
2020/07/22 16:53:30 DEBUG : 30GBFile.bin: Done copying chunk 8
2020/07/22 16:53:31 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:53:51 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:53:53 DEBUG : 30GBFile.bin: Done copying chunk 1
2020/07/22 16:53:53 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
... a bunch more of these...
2020/07/22 16:54:43 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:54:44 DEBUG : 30GBFile.bin: Done copying chunk 6
2020/07/22 16:54:45 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:55:59 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:56:01 DEBUG : 30GBFile.bin: Done copying chunk 7
2020/07/22 16:56:01 DEBUG : 30GBFile.bin: Finishing large file copy with 8 parts
2020/07/22 16:56:01 INFO : 0 / 30 GBytes, 0%, 0 Bytes/s, ETA -
2020/07/22 16:56:02 DEBUG : 30GBFile.bin: SHA-1 = e1f0e0d5c3d25026f141f72473191a8d9c2435e1 OK
2020/07/22 16:56:02 INFO : 30GBFile.bin: Copied (server side copy)
2020/07/22 16:56:02 INFO : 30G / 30 GBytes, 100%, 114.506 MBytes/s, ETA 0s
2020/07/22 16:56:02 DEBUG : 6 go routines active

Great - glad it works!

I think the 0% progress is a known issue - we haven't worked out a way to get feedback on the server side copy back to the caller yet.

@ncw Output from running with --dump bodies is too large for posting here, is there some other way I can get it to you?

I managed to replicate this myself now :slight_smile:

I uploaded a 6G file to b2 and I can see the problem.

I think it might be a bug in b2's s3 interface.

I'll investigate further...

Great, thank you so much, really appreciated!

I found a work-around - it looks like using --s3-copy-cutoff 1G will work.

Rclone uses 5G normally which is slightly bigger than b2 wants I think.