Copy low level retry does not respect pacer.retryAfterError

Ankur_Gupta · February 9, 2021, 3:00pm

What is the problem you are having with rclone?

For the copy operation, if Put() or Update() functions of any backend are called, the backends use CallNoRetry for the actual upload operation since the retry is supposed to be done using the low level retry, not using the pacer so that the file may be re-opened. If a pacer.retryAfterError is returned by the backend, we will not retry the error. Currently, we only handle fserrors.retryError

What is your rclone version (output from `rclone version`)

v1.54.0

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone copy <local_path> <remote_name>:<remote_path>

Is the above understanding correct? If so, what should be the best way to make the Copy() operation respect pacer.retryAfterError - should it just be an explicit sleep, or can we make it more sophisticated and somehow use pacer for the low lever retries?

ncw · February 10, 2021, 5:15pm

What needs to be done is obey the retry after error in the Copy retry loop - probably about here

github.com

rclone/rclone/blob/bf8542c67004428ad25b8eab1fa7e34608ccb908/fs/operations/operations.go#L485-L490


		// Retry if err returned a retry error
		if fserrors.IsRetryError(err) || fserrors.ShouldRetry(err) {
			fs.Debugf(src, "Received error: %v - low level retry %d/%d", err, tries, maxTries)
			tr.Reset(ctx) // skip incomplete accounting - will be overwritten by retry
			continue
		}

So see if the error is a retry error and sleep on it like is done in the pacer code.

Confusingly there seem to be two types of retry after error in the rclone code base!

How did you come across this error?

Ankur_Gupta · February 10, 2021, 6:06pm

That was my thought as well after creating this post. Please see if these changes look fine:

There are not. One is RetryError, another is RetryAfterError. One of them suggest if an error want to be retried, while the other suggests that an error wants to be retried and specifies the duration after which it wants to be retried.
Of course, both can, and should be condensed into a single error. Do you want me to make such a change within this MR (considering this is a bugfix, and merging both errors is more of a code improvement which will also require changes across multiple backends).

I was looking at the best way to handle the 429 status code and the Retry-After header for a custom backend.

ncw · February 11, 2021, 4:48pm

See github for review

Hmm, I suggest we leave as-is for the moment as the pacer RetryAfter should be low-level retried, but the other error shouldn't - it should be high level retried in the cmd/cmd.go loop I think.

Ankur_Gupta · February 11, 2021, 5:13pm

Both are low-level retried. Below is how the code looks after my changes:

		if fserrors.IsRetryError(err) || fserrors.ShouldRetry(err) {
			retry = true
		} else if t, ok := pacer.IsRetryAfter(err); ok {
			fs.Debugf(src, "Sleeping for %v (as indicated by the server) to obey Retry-After error: %v", t, err)
			time.Sleep(t)
			retry = true
		}

fserrors.IsRetryError already exists in RClone, and checks for fserrors.retryError, and the second check (pacer.IsRetryAfter) I added to retry pacer.retryAfterError.

ncw · February 11, 2021, 5:53pm

I'm talking about this error

github.com

rclone/rclone/blob/47b69d63009fb5808af3ebb325fb0783876beee5/fs/fserrors/error.go#L228-L236


// RetryAfter is an optional interface for error as to whether the
// operation should be retried after a given delay
//
// This should be returned from Update or Put methods as required and
// will cause the entire sync to be retried after a delay.
type RetryAfter interface {
	error
	RetryAfter() time.Time
}

Which isn't retried by IsRetryError or ShouldRetry

It is checked here

github.com

rclone/rclone/blob/47b69d63009fb5808af3ebb325fb0783876beee5/cmd/cmd.go#L270-L276


		if retryAfter := accounting.GlobalStats().RetryAfter(); !retryAfter.IsZero() {
			d := retryAfter.Sub(time.Now())
			if d > 0 {
				fs.Logf(nil, "Received retry after error - sleeping until %s (%v)", retryAfter.Format(time.RFC3339Nano), d)
				time.Sleep(d)
			}
		}

It confused me anyway

Ankur_Gupta · February 11, 2021, 6:08pm

I missed this RetryAfter error. I am pretty sure this is an unintended duplication. It is used only in the Swift backend, and intention seems to be same as pacer.RetryAfterError.

ncw · February 12, 2021, 10:26am

What it is used for in the swift backend is to say attempt all the copies, but if any failed with this error then wait this long before doing a complete retry. What it is for is so that users can get things out of cold storage which takes time, but they do them all at once and wait for all of them at once at the high level retry.

So yes, these are similar, but one should be low level retried and one should be high level retried...

system · April 14, 2021, 6:26am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Copy low level retry does not respect pacer.retryAfterError

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

What is your rclone version (output from `rclone version`)

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)