Our data source is very dynamic, some objects can be created and delete at a high rate (access tokens). When we trying to back up this data, we get errors: "Failed to copy: failed to open source object". We are OK with this. How can we continue to backup objects that are still there (the most)? Our backup process all the time fails because of such errors.
What is your rclone version (output from rclone version)
latest
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Docker image debian:stretch-slim
Which cloud storage system are you using? (eg Google Drive)
IBM Cloud Object Storage
The command you were trying to run (eg rclone copy /tmp remote:tmp)
Am I understanding it that basically the issue is that you have a file, the file is about to be backed up and from the source, it's gone so you get an error since it can't find it on the source?
Yes. In our source, we have many temporary files (tokens)
We don't need to backup them, but we can't distinguish short time tokens from long time tokens.
If during copying some file is not found, was deleted, it's ok, it should not be copied.
We would want to ignore such type of errors.
Is it possible?
That is a great idea. It will reduce the time from listing the file to transferring the file. If you make it too small it will impact performance. It will also impact the ETA times in the stats.
I think probably what you want is to set --retries 1 so rclone only tries once to do the backup, otherwise it will keep trying and keep getting a different set of errors.
We will try this option, thanks. We always have not more than 1000 objects, we will try to break it into chunks of 100
But it means that the action will fail in this 1 retry. There is a small chance that after retry it will take a list that won't be changed during copying. We don't change retries, it's 3 by default
Whether should be any correlation between --max-backlog and --transfers ?
is it preferable that the backlog value be a multiple of transfers?
if I set them the same value, will this work optimally?
thanks
I found that when I decrease the value of max-backlog backup for small buckets (with a few files) takes much more time than before, but backup for big buckets (with a lot of files) takes much less time.
When --max-backlog=1 backup of the bucket with 18 000 files takes a few seconds (!!??), but the backup of the bucket with 90 files takes 5 minutes (!!??).
Can you explain this phenomenon?
I tried to run backup of only the small bucket and it finished in a few seconds
But when I run in parallel 20 small buckets, the last finishes in 5 min as I wrote
The big buckets run in parallel too. But in our specific case, they are not 20 but 11 buckets in parallel and the last finished in a few seconds
We got the error even with --max-backlog=1 --transfers=1:
ERROR : Attempt 1/3 failed with 13 errors and: failed to open source object: NoSuchKey: The specified key does not exist.
ERROR : Attempt 2/3 failed with 85 errors and: failed to open source object: NoSuchKey: The specified key does not exist.
ERROR : Attempt 3/3 failed with 7 errors and: failed to open source object: NoSuchKey: The specified key does not exist.
ERROR : sys/token/id/h0c6320540822dc11ded9481aa1eb5357f83fb5c92ea0bdd312b10f9f52379646: Failed to copy: failed to open source object: NoSuchKey: The specified key does not exist.
status code: 404, request id: 59e00cc2-3cfb-4bfc-b640-d5a78ff0be9f, host id:
I posted only the final line of every retry
I want it to retry because there is a chance that it will succeed (using default retry=3)
My questions
Why with max-backlogs=1 the error still occurs??
Can I add some other flag to avoid this type of errors?
It might be to do with eventual consistency of the listings. Most object storage systems are "eventually consistent" which means that the listings and the objects are allowed to get out of sync for "a while" as long as they get back in sync eventually.
I can't think of any. If you have files deleted during the transfer, then you will always get errors and cause a retry.