Retry for failed "copyto"

tantre · April 13, 2021, 2:10pm

What is the problem you are having with rclone?

Our data source is very dynamic, some objects can be created and delete at a high rate (access tokens). When we trying to back up this data, we get errors: "Failed to copy: failed to open source object". We are trying to retry the command with parameters
--max-backlog 1 --transfers 1 --retries 10 --retries-sleep 500ms
But in every retry we see the same error about the same object

ERROR : sys/token/id/h4ea39e9da44c402427a94c58a1f75afd2fbc2c42603f287649e5a0c092f2ab8f: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: b60f278d-01b7-43a3-93b2-e48bc687c874, host id: 
ERROR : Attempt 1/10 failed with 1 errors and: failed to open source object: NoSuchKey: The specified key does not exist. status code: 404, request id: b60f278d-01b7-43a3-93b2-e48bc687c874, host id: 

ERROR : sys/token/id/h4ea39e9da44c402427a94c58a1f75afd2fbc2c42603f287649e5a0c092f2ab8f: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: 61722858-d96c-4b68-8fe3-8eaf62920ae6, host id: 
ERROR : Attempt 2/10 failed with 1 errors and: failed to open source object: NoSuchKey: The specified key does not exist.status code: 404, request id: 61722858-d96c-4b68-8fe3-8eaf62920ae6, host id: 

ERROR : sys/token/id/h4ea39e9da44c402427a94c58a1f75afd2fbc2c42603f287649e5a0c092f2ab8f: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: f8a4183f-275b-434a-907e-33e8b932f0cb, host id: 
ERROR : Attempt 3/10 failed with 2 errors and: failed to open source object: NoSuchKey: The specified key does not exist. status code: 404, request id: f8a4183f-275b-434a-907e-33e8b932f0cb, host id:

etc
It tries to copy the same not existing object!!
Why doesn't it start from get the next object?
How can we change this command to handle our situation?

What is your rclone version (output from `rclone version`)

latest

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Docker image debian:stretch-slim

Which cloud storage system are you using? (eg Google Drive)

IBM Cloud Object Storage

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone --config=rclone.config copyto --no-check-dest --max-backlog 1 --transfers 1 --retries 10 --retries-sleep 500ms COS_SOURCE:${bucket} COS_DESTINATION:${bucket}${BACKUP_BUCKET_SUFFIX}/${dateString} --exclude=core/lock

asdffdsa · April 13, 2021, 2:30pm

perhaps do it in two steps

rclone lsf to get a list of files
rclone copy that list of files

tantre · April 13, 2021, 2:35pm

thanks for quick reply
if we will split it to 2 steps the issue will be much more likely because the time between get and copy will increase further.
I would expect from --max-backlog 1 --transfers 1 to work so: "take 1 and copy it, take the next, copy it"

ncw · April 13, 2021, 8:08pm

These are low level retries.

Rclone will retry the listing when it does the high level retry (configured by --retries)

tantre · April 15, 2021, 3:28pm

We use command
rclone --config=rclone.config copyto --no-check-dest --max-backlog 1 --transfers 1 --retries 10 --retries-sleep 500ms
Do we need to change something to make the high level retry?

ncw · April 15, 2021, 3:55pm

You've configured 10 high level retries there.

Can you put a complete log with -vv up where I can see it?

tantre · April 15, 2021, 6:15pm

there are logs of 2 retries:



Error: Command failed: rclone --config=rclone.config copyto --no-check-dest --max-backlog 1 -vv --transfers 1 --retries 5 --retries-sleep 500ms COS_SOURCE:655d2402-a867-4f8c-ae06-dfea76720eb5 COS_DESTINATION:655d2402-a867-4f8c-ae06-dfea76720eb5-backup/2021_04_15_20_51 --exclude=core/lock
2021/04/15 20:53:13 DEBUG : rclone: Version "v1.53.3-DEV" starting with parameters ["rclone" "--config=rclone.config" "copyto" "--no-check-dest" "--max-backlog" "1" "-vv" "--transfers" "1" "--retries" "5" "--retries-sleep" "500ms" "COS_SOURCE:655d2402-a867-4f8c-ae06-dfea76720eb5" "COS_DESTINATION:655d2402-a867-4f8c-ae06-dfea76720eb5-backup/2021_04_15_20_51" "--exclude=core/lock"]
2021/04/15 20:53:13 DEBUG : Using config file from "../local/rclone.config"
2021/04/15 20:53:13 DEBUG : Creating backend with remote "COS_SOURCE:655d2402-a867-4f8c-ae06-dfea76720eb5"
2021/04/15 20:53:13 DEBUG : Creating backend with remote "COS_DESTINATION:655d2402-a867-4f8c-ae06-dfea76720eb5-backup/2021_04_15_20_51"
2021/04/15 20:53:15 DEBUG : core/lock: Excluded
2021/04/15 20:53:16 DEBUG : ROOT-TOKEN-655d2402-a867-4f8c-ae06-dfea76720eb5: MD5 = e467f83f8ff160cdbbd62d7e86971d6b OK
2021/04/15 20:53:16 INFO  : ROOT-TOKEN-655d2402-a867-4f8c-ae06-dfea76720eb5: Copied (new)
.....
2021/04/15 20:53:49 DEBUG : sys/expire/id/auth/token/create/usage/h19a70a2362e7053574f484fcdbc1c156123e26f369d152111f9410edd6f10513: MD5 = 54c9ca9fbe807db21ba2db787ee00cac OK
2021/04/15 20:53:49 INFO  : sys/expire/id/auth/token/create/usage/h19a70a2362e7053574f484fcdbc1c156123e26f369d152111f9410edd6f10513: Copied (new)
2021/04/15 20:53:50 DEBUG : sys/token/accessor/77a17bec7bee7ec480bc9e2b30dce2042732c107: MD5 = bce70ff8e2de6b3aaa03957ce0e682b5 OK
2021/04/15 20:53:50 INFO  : sys/token/accessor/77a17bec7bee7ec480bc9e2b30dce2042732c107: Copied (new)
2021/04/15 20:53:50 ERROR : sys/token/id/h4ea39e9da44c402427a94c58a1f75afd2fbc2c42603f287649e5a0c092f2ab8f: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: 5243c491-e8be-4e3d-9e75-22a4d4d2371b, host id: 
2021/04/15 20:53:51 DEBUG : sys/policy/instance-secrets-reader: MD5 = 3780e6afad23643b8846502c5fcc4bc9 OK
2021/04/15 20:53:51 INFO  : sys/policy/instance-secrets-reader: Copied (new)
.....

2021/04/15 20:54:24 DEBUG : core/plugin-catalog/secret/username_password: MD5 = 8c50cb8022941d808b53751cf0dc608b OK
2021/04/15 20:54:24 INFO  : core/plugin-catalog/secret/username_password: Copied (new)
2021/04/15 20:54:24 ERROR : Attempt 1/5 failed with 1 errors and: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: 5243c491-e8be-4e3d-9e75-22a4d4d2371b, host id: 


2021/04/15 20:54:25 DEBUG : core/lock: Excluded
2021/04/15 20:54:26 DEBUG : ROOT-TOKEN-655d2402-a867-4f8c-ae06-dfea76720eb5: MD5 = e467f83f8ff160cdbbd62d7e86971d6b OK
2021/04/15 20:54:26 INFO  : ROOT-TOKEN-655d2402-a867-4f8c-ae06-dfea76720eb5: Copied (new)
.....
2021/04/15 20:55:01 DEBUG : sys/token/accessor/77a17bec7bee7ec480bc9e2b30dce2042732c107: MD5 = bce70ff8e2de6b3aaa03957ce0e682b5 OK
2021/04/15 20:55:01 INFO  : sys/token/accessor/77a17bec7bee7ec480bc9e2b30dce2042732c107: Copied (new)
2021/04/15 20:55:02 DEBUG : core/master: MD5 = 6df7c9c14835df657d2d975ee821ddb2 OK
2021/04/15 20:55:02 INFO  : core/master: Copied (new)
2021/04/15 20:55:02 ERROR : sys/token/id/h4ea39e9da44c402427a94c58a1f75afd2fbc2c42603f287649e5a0c092f2ab8f: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: da2d9c61-6c40-4e6d-80a9-db6b92b2571e, host id: 
2021/04/15 20:55:03 DEBUG : sys/expire/id/auth/token/create/usage/ha436e4f12b3c0a23431c03eb82ead116157b77c8807a0a0baac3df749ebefc50: MD5 = 7fd300cf0b4630f4bfe09e77b602d431 OK
2021/04/15 20:55:03 INFO  : sys/expire/id/auth/token/create/usage/ha436e4f12b3c0a23431c03eb82ead116157b77c8807a0a0baac3df749ebefc50: Copied (new)
.....

2021/04/15 20:55:32 DEBUG : core/plugin-catalog/secret/username_password: MD5 = 8c50cb8022941d808b53751cf0dc608b OK
2021/04/15 20:55:32 INFO  : core/plugin-catalog/secret/username_password: Copied (new)
2021/04/15 20:55:32 ERROR : Attempt 2/5 failed with 1 errors and: failed to open source object: NoSuchKey: The specified key does not exist.
        status code: 404, request id: da2d9c61-6c40-4e6d-80a9-db6b92b2571e, host id:

ncw · April 15, 2021, 6:42pm

It might be that that object is broken somehow? Can you try downloading it individually? How about with the web interface of the cloud storage?

tantre · April 18, 2021, 7:46am

No. I checked, the object doesn't exist
I see similar logs for different buckets and objects: most retries fail on the same object (with --max-backlog 32 --transfers 32 too!)
Please advise

ncw · April 18, 2021, 10:02am

Rclone clearly saw the object in the listing, but when it came to download it, it wasn't there.

What is suprising is when rclone did the retry the object still was there in the listing.

This could be to do with the eventual consistency of the cloud storage - they are allowed to get out of sync listing vs actual state of the object for a short while.

tantre · April 18, 2021, 10:04am

Thanks for your reply
Checking with Object Storage team

tantre · May 9, 2021, 11:24am

Thank you for your help
The issue was because of Object Storage bug and was solved.

but it still happens that backup fails with this command
rclone --config=rclone.config copyto --no-check-dest --max-backlog 1 --transfers 1 --retries 10 --retries-sleep 500ms
because an object was deleted and even with 10 retries (every time another object)
How can it be that "take 1 object, send it in 1 thread" fails?
It happens immediately, doesn't it?

ncw · May 10, 2021, 4:20pm

I don't know. It will be to do with eventual consistency on the Object Storage.

tantre · June 9, 2021, 9:17am

Please help me to understand this phenomena:
We run copyto with flags --max-backlog 32 --transfers 32 for some bucket and get errors in all 3 retries:

Attempt 1/3 failed with 2 errors...
Attempt 2/3 failed with 37 errors...
Attempt 3/3 failed with 50 errors..

All the errors are about not existing objects (that were deleted during copying)
Then we try to run copyto with flags --max-backlog 1 --transfers 1 for the same bucket in order to minimize a chance to fail. We get errors again:

Attempt 1/10 failed with 228 errors...
Attempt 2/10 failed with 326 errors...
etc

Why do we have more errors in this case? We thought this will be more likely to work
Do you think it's worth to run with --max-backlog 1 --transfers 1 or we just need to increase retries with the fist configuration (--max-backlog 32 --transfers 32) ?

Thanks in advance

ncw · June 10, 2021, 8:58am

You want the sync to complete as quickly as possible - setting --transfers 1 will slow down the sync.

You may find this helpful

  --local-no-check-updated   Don't check to see if the files change during upload

But if the errors are about files being deleted only, then it won't help.

I suspect --max-backlog 1 --transfers 32 might be optimal.

tantre · June 10, 2021, 9:00am

I'll try it, thank you

tantre · June 29, 2021, 11:18am

Thank you for you help!

Another question connected to the "object not found" error during copying
If rclone copyto command failed because some objects were removed from the source during the process, does this mean that all other objects were copied?
Can we say that the destination bucket contains all the objects existing in the source bucket at the end of the process?

tantre · July 1, 2021, 1:56pm

@ncw Can you please advise?
Can we consider a backup with errors of type "object not found" to be successful if we don't need removed objects??
Does it contain everything except removed objects?

Thanks in advance

Animosity022 · July 1, 2021, 2:09pm

That's really up to you and if you feel that's a good backup, I suppose that's your use case. If a file disappears before being copied, that's a failure normally with anything else.

I personally wouldn't assume anything works and run a check after wards if you are concerned about data integrity but if the source is changing, not sure how this would work. Most backups take a snapshot of the source to make it a point in time.

tantre · July 1, 2021, 2:18pm

I see. Thank you