Backup script fails after upgrade to 1.59 - multipart upload failed

Problem Summary

I have two issues, but the main one I'd like assistance with is that since upgrading to 1.59.0 part of the sync is failing with the error:

multipart upload failed to upload part: InvalidArgument: Part number must be an integer between 1 and 10000, inclusive

I'll provide the full logs below. But nothing else has changed config wise, or on the S3 side. The previous version was 1.58.0

rclone version'

rclone v1.59.0
- os/version: ubuntu 20.04 (64 bit)
- os/kernel: 5.4.0-122-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.3
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

AWS

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --check-first --fast-list --ignore-existing --transfers 40 /mnt/backup/my/backups/ my_s3:my-archive/backups/

But this is part of a bash script, and this is the full section of the script trying to run the sync.

logger -t "RCLONE" "rclone START for /mnt/backup/my/backups/"
if ! rclone sync --check-first --fast-list --ignore-existing --transfers 40 /mnt/backup/my/backups/ my_s3:my-archive/backups/ 2>&1 | tee >(logger -t "RCLONE");
then
    # rclone failed
    logger -t "RCLONE" "rclone FAILED for /mnt/backup/my/backups/"
    backup_status=1
fi
logger -t "RCLONE" "rclone FINISH for /mnt/backup/my/backups/"

The rclone config contents with secrets removed.

[my_s3]
type = s3
provider = AWS
env_auth = false
access_key_id = <redacted>
secret_access_key = <redacted>
region = eu-west-2
location_constraint = eu-west-2
acl = private
upload_concurrency = 8

A log from the command with the -vv flag

Unfortunately as this is a live system (and this error has made us go over our data transfer quota) I cannot run this at the moment. But below are the logs that my script had written to syslog.

Aug 02 02:00:02 tooting RCLONE[2166904]: rclone START for /mnt/backup/my/backups/
Aug 02 03:29:09 tooting RCLONE[2166908]: 2022/08/02 03:29:09 ERROR : CBB_DBSVR/CBB_Database/DBSVR/large.database/20220730200006/large.database_full_20220730200006.bak: Failed to copy: multipart upload failed to>
Aug 02 03:29:09 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 03:29:09 tooting RCLONE[2166908]: 2022/08/02 03:29:09 ERROR : S3 bucket my-archive path backups: not deleting files as there were IO errors
Aug 02 03:29:09 tooting RCLONE[2166908]: 2022/08/02 03:29:09 ERROR : S3 bucket my-archive path backups: not deleting directories as there were IO errors
Aug 02 03:29:09 tooting RCLONE[2166908]: 2022/08/02 03:29:09 ERROR : Attempt 1/3 failed with 1 errors and: multipart upload failed to upload part: InvalidArgument: Part number must be an integer between 1 and 10000, inclusive
Aug 02 03:29:09 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 04:57:30 tooting RCLONE[2166908]: 2022/08/02 04:57:30 ERROR : CBB_DBSVR/CBB_Database/DBSVR/large.database/20220730200006/large.database_full_20220730200006.bak: Failed to copy: multipart upload failed to>
Aug 02 04:57:30 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 04:57:30 tooting RCLONE[2166908]: 2022/08/02 04:57:30 ERROR : S3 bucket my-archive path backups: not deleting files as there were IO errors
Aug 02 04:57:30 tooting RCLONE[2166908]: 2022/08/02 04:57:30 ERROR : S3 bucket my-archive path backups: not deleting directories as there were IO errors
Aug 02 04:57:30 tooting RCLONE[2166908]: 2022/08/02 04:57:30 ERROR : Attempt 2/3 failed with 1 errors and: multipart upload failed to upload part: InvalidArgument: Part number must be an integer between 1 and 10000, inclusive
Aug 02 04:57:30 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 06:25:58 tooting RCLONE[2166908]: 2022/08/02 06:25:58 ERROR : CBB_DBSVR/CBB_Database/DBSVR/large.database/20220730200006/large.database_full_20220730200006.bak: Failed to copy: multipart upload failed to>
Aug 02 06:25:58 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 06:25:58 tooting RCLONE[2166908]: 2022/08/02 06:25:58 ERROR : S3 bucket my-archive path backups: not deleting files as there were IO errors
Aug 02 06:25:58 tooting RCLONE[2166908]: 2022/08/02 06:25:58 ERROR : S3 bucket my-archive path backups: not deleting directories as there were IO errors
Aug 02 06:25:58 tooting RCLONE[2166908]: 2022/08/02 06:25:58 ERROR : Attempt 3/3 failed with 1 errors and: multipart upload failed to upload part: InvalidArgument: Part number must be an integer between 1 and 10000, inclusive
Aug 02 06:25:58 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 06:25:58 tooting RCLONE[2166908]: 2022/08/02 06:25:58 Failed to sync: multipart upload failed to upload part: InvalidArgument: Part number must be an integer between 1 and 10000, inclusive
Aug 02 06:25:58 tooting RCLONE[2166908]:         status code: 400, <redacted>, host id: <redacted>
Aug 02 06:25:58 tooting RCLONE[2192745]: rclone FINISH for /mnt/backup/my/backups/

Try adding:

--s3-max-upload-parts 10000

Guessing something is slightly off when dynamically figuring out the chunk size.

Thanks. I forgot to mention that I saw this as a possible solution or setting the --s3-chunk-size 16M but if that's the case then surely this should have been marked as a breaking change at least, or it's possibly a bug.

Unknown as I don't know if that fixed it for you yet so let's step through it :slight_smile:

Did you try that fix and did it work or not work?

That's a good point. Unfortunately as this error has caused us to blow our transfer quota, and it's a live system, I can't test it. I'll have to find a different way to test this now. But hopefully in the meantime someone else may have the same issue and can test further.

I did notice this in the change log and it does point to a change that would be affecting the chunking. After all 10000 parts is the default, so why have to set it to the same number again?

  • Use chunksize lib to determine chunksize dynamically (Derek Battams)

I read that as why as that's why I asked you to test it out.

Without knowing if we're fixed, we'd be guessing a bit so I usually like to go step by step before jumping around.

This looks like a bug...

Can you tell us exactly how big the file that failed was?

I'm guessing this is the culprit

If we know the exact size of the file then it should be easy to work out what went wrong.

Yes. The file is 113GB (120864818840 to be exact)

I ran that number through the chunk size calculator and it came up with 12 MiB chunks which gave a total number of chunks of 9,606 which is less than 10,000.

Is it possible this file changed in size during the upload?

I think to debug this further I'm going to need a debug log with -vv of it going wrong.

Thanks for checking and getting back to me. I'm going to setup a test VM in EC2 today and try it on there first to see if I can replicate it. If not, I'll try and run the command with -vv but directly against the DB file directory instead of our whole backup directory. Hopefully I'll be able to get enough info before it starts an actual transfer.

EDIT: Forgot to mention that the file wouldn't have changed because it gets written by a different backup software. The backup was already finished the day before the sync tried to run. Additionally the sync tried to run on 5 different days.

@ncw do you think it would be enough to run with the --dry-run flag?

EDIT:

I've tried it and it seems it doesn't have the same result or tell us what it calculated it at.

:~# rclone sync --check-first --fast-list --ignore-existing --transfers 40 -vv --dry-run /mnt/backup/my/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/ my_s3:my-archive/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/
2022/08/09 09:24:17 DEBUG : rclone: Version "v1.59.0" starting with parameters ["rclone" "sync" "--check-first" "--fast-list" "--ignore-existing" "--transfers" "40" "-vv" "--dry-run" "/mnt/backup/my/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/" "my_s3:my-archive/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/"]
2022/08/09 09:24:17 DEBUG : Creating backend with remote "/mnt/backup/my/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/"
2022/08/09 09:24:17 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/08/09 09:24:17 DEBUG : Creating backend with remote "my_s3:my-archive/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/"
2022/08/09 09:24:17 DEBUG : fs cache: renaming cache item "my_s3:my-archive/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005/" to be canonical "my_s3:my-archive/backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005"
2022/08/09 09:24:17 INFO  : S3 bucket my-archive path backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005: Running all checks before starting transfers
2022/08/09 09:24:17 DEBUG : large.database_diff_20220807200004.bak: Destination exists, skipping
2022/08/09 09:24:17 DEBUG : S3 bucket my-archive path backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005: Waiting for checks to finish
2022/08/09 09:24:17 INFO  : S3 bucket my-archive path backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005: Checks finished, now starting transfers
2022/08/09 09:24:17 NOTICE: large.database_diff_20220808200003.bak: Skipped copy as --dry-run is set (size 11.688Mi)
2022/08/09 09:24:17 NOTICE: large.database_full_20220806200005.bak: Skipped copy as --dry-run is set (size 112.564Gi)
2022/08/09 09:24:17 DEBUG : S3 bucket my-archive path backups/CBB_Database/CBB_Database/DBSVR/large.database/20220806200005: Waiting for transfers to finish
2022/08/09 09:24:17 DEBUG : Waiting for deletions to finish
2022/08/09 09:24:17 NOTICE:
Transferred:      112.576 GiB / 112.576 GiB, 100%, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Transferred:            2 / 2, 100%
Elapsed time:         0.0s

2022/08/09 09:24:17 DEBUG : 5 go routines active

I read the code very carefully again, and I see the problem now.

It is using the old object size to calculate the number of chunks not the new size.

I think this will fix the problem

v1.60.0-beta.6413.3a612bc84.fix-chunksize-calculator on branch fix-chunksize-calculator (uploaded in 15-30 mins)

Thank you Nick. Looks like you got there first :slightly_smiling_face: I've just managed to run the command and it is indeed using the 5MiB default chunksize.

:/rctest# rclone sync --check-first --fast-list --ignore-existing --transfers 40 -vv /rctest/ my_s3:my-archive/rctest/
2022/08/09 10:01:48 DEBUG : rclone: Version "v1.59.1" starting with parameters ["rclone" "sync" "--check-first" "--fast-list" "--ignore-existing" "--transfers" "40" "-vv" "/rctest/" "my_s3:my-archive/rctest/"]
2022/08/09 10:01:48 DEBUG : Creating backend with remote "/rctest/"
2022/08/09 10:01:48 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/08/09 10:01:48 DEBUG : Creating backend with remote "my_s3:my-archive/rctest/"
2022/08/09 10:01:48 DEBUG : fs cache: renaming cache item "my_s3:my-archive/rctest/" to be canonical "my_s3:my-archive/rctest"
2022/08/09 10:01:48 INFO  : S3 bucket my-archive path rctest: Running all checks before starting transfers
2022/08/09 10:01:48 DEBUG : S3 bucket my-archive path rctest: Waiting for checks to finish
2022/08/09 10:01:48 INFO  : S3 bucket my-archive path rctest: Checks finished, now starting transfers
2022/08/09 10:01:48 DEBUG : S3 bucket my-archive path rctest: Waiting for transfers to finish
2022/08/09 10:02:48 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       1m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:03:48 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       2m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:04:48 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       3m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:05:48 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       4m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 1 size 5Mi offset 0/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 2 size 5Mi offset 5Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 3 size 5Mi offset 10Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 4 size 5Mi offset 15Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 5 size 5Mi offset 20Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 6 size 5Mi offset 25Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 7 size 5Mi offset 30Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 8 size 5Mi offset 35Mi/112.075Gi
2022/08/09 10:06:05 DEBUG : big-file.img: multipart upload starting chunk 9 size 5Mi offset 40Mi/112.075Gi
<snip>

I'll test the beta shortly and get back to you.

1 Like

Thanks @ncw. The beta version is using the right chunk size now. When will the new release version be with this fix in it (so we can schedule updates etc.)?

:~# rclone sync --check-first --fast-list --ignore-existing --transfers 40 -vv /rctest/ my_s3:my-archive/rctest/
2022/08/09 10:50:23 DEBUG : rclone: Version "v1.60.0-beta.6413.3a612bc84.fix-chunksize-calculator" starting with parameters ["rclone" "sync" "--check-first" "--fast-list" "--ignore-existing" "--transfers" "40" "-vv" "/rctest/" "my_s3:my-archive/rctest/"]
2022/08/09 10:50:23 DEBUG : Creating backend with remote "/rctest/"
2022/08/09 10:50:23 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/08/09 10:50:23 DEBUG : Creating backend with remote "my_s3:my-archive/rctest/"
2022/08/09 10:50:23 DEBUG : fs cache: renaming cache item "my_s3:my-archive/rctest/" to be canonical "my_s3:my-archive/rctest"
2022/08/09 10:50:23 INFO  : S3 bucket my-archive path rctest: Running all checks before starting transfers
2022/08/09 10:50:23 DEBUG : S3 bucket my-archive path rctest: Waiting for checks to finish
2022/08/09 10:50:23 INFO  : S3 bucket my-archive path rctest: Checks finished, now starting transfers
2022/08/09 10:50:23 DEBUG : S3 bucket my-archive path rctest: Waiting for transfers to finish
2022/08/09 10:51:23 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       1m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:52:23 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       2m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:53:23 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       3m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:54:23 INFO  :
Transferred:              0 B / 112.075 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:       4m0.0s
Transferring:
 *                                  big-file.img:  0% /112.075Gi, 0/s, -

2022/08/09 10:54:40 DEBUG : big-file.img: size: 112.075Gi, parts: 10000, default: 5Mi, new: 12Mi; default chunk size insufficient, returned new chunk size
2022/08/09 10:54:40 DEBUG : big-file.img: multipart upload starting chunk 1 size 12Mi offset 0/112.075Gi
2022/08/09 10:54:40 DEBUG : big-file.img: multipart upload starting chunk 2 size 12Mi offset 12Mi/112.075Gi
2022/08/09 10:54:40 DEBUG : big-file.img: multipart upload starting chunk 3 size 12Mi offset 24Mi/112.075Gi
2022/08/09 10:54:40 DEBUG : big-file.img: multipart upload starting chunk 4 size 12Mi offset 36Mi/112.075Gi
2022/08/09 10:54:40 DEBUG : big-file.img: multipart upload starting chunk 5 size 12Mi offset 48Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 6 size 12Mi offset 60Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 7 size 12Mi offset 72Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 8 size 12Mi offset 84Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 9 size 12Mi offset 96Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 10 size 12Mi offset 108Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 11 size 12Mi offset 120Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 12 size 12Mi offset 132Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 13 size 12Mi offset 144Mi/112.075Gi
2022/08/09 10:54:41 DEBUG : big-file.img: multipart upload starting chunk 14 size 12Mi offset 156Mi/112.075Gi

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.60 which is sheduled for early September.

I may release this in v1.59.2 also - haven't decided if we will need the point release yet.

That's great. Thanks again.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.