[SOLVED] Rclone `lsl` on a large Encrypted GDrive dirtree is aborting at random points with "oauth2: cannot fetch token: 401 Unauthorized" instead of retrying as configured

What is the problem you are having with rclone?

Using rclone lsl to recursively list a very large Encrypted Google Drive dirtree never finishes; it runs for some time and then aborts with oauth2: cannot fetch token: 401 Unauthorized without just 2 x 5 retries, despite having both --retries and --low-level-retries set to 1000.

The problem is not with the directory: running the same command again either aborts with the same error but on a 'previous' directory (which showed no error on the previous run) or goes right past it (also with no errors) to abort on a directory 'further on'.

BTW, This also happens when using rclone copy (with these same parameters) to copy this same Encrypted Google Drive remote to another (on another Google Workspace account), with the difference that rclone marks the error and stops traversing the dirtree under that directory, and retries it later on -- which is also a problem because, as the dirtree is very very large (tens of millions of files), means it will traverse the same tens of millions of files again and again (always getting errors at different directories) and therefore the command never finishes.

Also, running rclone config reconnect GOOGLE_DRIVE{84KcY}: (as suggested by the final error message) between one run of rclone lsl and the next, doesn't change anything.

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1

  • os/version: debian 11.1 (64 bit)
  • os/kernel: 6.0.0-4-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.19.4
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Encrypted remote on top of a Google Drive remote.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone -vv --rc --rc-addr --checkers=32 --low-level-retries=1000 --retries=1000 --drive-chunk-size=64m --fast-list lsl ENCRYPTED_GOOGLE_DRIVE:

The rclone config contents with secrets removed.

type = crypt
filename_encryption = standard
password = REDACTED00
password2 = REDACTED01

type = drive
client_id = REDACTED06
client_secret = REDACTED07
token = {"access_token":"REDACTED08","token_type":"Bearer","refresh_token":"REDACTED09","expiry":"2023-02-16T12:18:56.541006812-03:00"}
root_folder_id = REDACTED10

A log from the command with the -vv flag

Please see https://durval.com/xfer-only/rclone_lsl_aborting_with_error_401_DEBUG_output.txt

Are those new client ID/secrets?

I'm not quite sure how the size would tie back.

retries/low-level tries isn't going to do much for a 401 unauthorized as that needs a reconnect.

No, they're about 8 years old, been in use at least once a day the whole time.

Not sure what you mean, but presuming you mean how the size of the dirtree being lsl'ed would affect the error: if I run the same command on a small(er) subdir of ENCRYPTED_GOOGLE_DRIVE: the error doesn't happen and the listing runs to completion.

Well, if it needs to reconnect in order to retry, shouldn't it? BTW, rclone copy on the same dirtree does.

Thanks for your response and for trying to help! Please let me know if you need any more details.

Yeah, that's what I figured as well, but wanted to ask.

The error in my understanding is related to being unable to refresh a token and that's the error when you get your creds revoked or they expire.

Are you dead in the water with that remote until you reconnect it back again?

There isn't a number of times reconnecting would fix that error as it's not a retry type error. The low level retries and retries are generally for a 'hiccup' in the network as once you get get a response back of unauthorized, I can't imagine a scenario where that works on the next try (my key point there is Google returns a response and not a timeout, dns error, etc).

I just looked at the code - the number of retries is hardcoded at 5.

The question that needs answering is why it ever got a 401 Unauthorized error? This isn't the kind of error retries are meant to fix. Retrying a 401 Unauthorized is usually pointless.

The only thing I can think of is if that bit of code is caching an old refresh token.

can you look through the log and see if the token refresh succeeded before it failed? If so how much time was there between the last success and the first failure?

Another possibility - is there another process running using that config file? Perhaps that other process fetched a new refresh token and the listing process didn't update its copy of the refresh token.

1 Like

TL;DR for those that may later come a-googling:

  • if you get 401 errors, make damn sure your actual rclone.conf file have the client_id and client_secret lines, and that they are not commented out.

More details:

@ncw @Animosity022 thanks for your responses.

In the precess of investigating the things @ncw pointed to, I think I found the issue: the rclone.conf file being used by this command was not the one I thought it was (due to a local issue with misdirected symlinks, not rclone's fault).

Here's the relevant part of the rclone.conf file that was actually being used:

type = drive
#client_id = REDACTED06
#client_secret = REDACTED07
token = {"access_token":"REDACTED08","token_type":"Bearer","refresh_token":"REDACTED09","expiry":"2023-02-16T12:18:56.541006812-03:00"}
root_folder_id = REDACTED10

Please notice the "#" at the start of both the client_id and client_secret lines... :man_facepalming:AFAIK, that would make rclone use its own 'internal/default/free-for-all' client ID and secret... which IMHO would make things slower but should not make them fail, right?

Anyway, I removed the '#' and ran the exact same rclone lsl command again, and this time it finished without any issues, in just short of 6h of elapsed time.

I'm now running the aforementioned rclone copy again (with the '#' removed) and so far no errors, I also expect it to finish without any issues.

Presuming it was really my inadvertent use of rclone's default client ID that caused the issue, I have two suggestions to make:

  1. at least on DEBUG level, but probably INFO level, I think rclone should print a clear message to indicate it's using its default/built-in client ID;

  2. I always supposed that using rclone's default client ID would make things slower but not break anything. Perhaps this is something that should be looked into, and if it's Google effing things up and no fault of rclone nor anything that rclone could work around, include a warning in the documentation (and on the message mentioned above) strongly discouraging using it with the default client ID?

Thanks again to everyone who helped; I will come back and post a final update when my currently running rclone copy finishes.

Rclone wouldn't know that as it's only when you authorize so any transaction after it already had a token and rclone wouldn't know who's token has auth'ed so not sure that's possible.

The internet is a great place and also a very sucky place. Rclone's default ID has been used by many other nefarious projects breaking things along the way hence the ask to use your own. It's unfortunately not a winning battle as anything changing it would break other legit rclone user's that don't have their own keys. It's a bummer for sure but that's the human race unfortunately. We've reported many sites using the key but it is what it is.

I'm sorry if I was not clear. What I meant is, if a GoogleDrive remote doesn't have the client_id parameter set (which would make rclone use its own, built-in client ID), then rclone should print the message I mentioned.

If that is still not clear, please let me know and I will again try to rephrase.

Right now, rclone googledrive documentation states:

Google Application Client Id Setting your own is recommended. See Google drive for how to create your own. If you leave this blank, it will use an internal key which is low performance.

I think this needs at least to be rephrased to include a warning that things may break in a not-just-performance-wise way.

The same applies to the rclone config command for Google Drive:

Google Application Client Id Setting your own is recommended. See https://rclone.org/drive/#making-your-own-client-id for how to create your own. If you leave this blank, it will use an internal key which is low performance.

Given the above, I would also like to rephrase my suggestion: if using rclone's own client ID can actually break things, this message should be printed at least at WARNING level.

That's clear. Feels like overkill to me as it's noted a number of spots so just need someone that wants to code it and/or submit a PR.

Feel free to submit a PR to change it to anything you like or would want to see worded better.

Using the default ID doesn't break things per se. People abusing is what breaks it.

If the default ID was removed, rclone would not work at all so not sure. It's clear to create your own and if you don't, except slow performance, etc.

Yep, but none of these "number of spots" include the program messages themselves nor the documentation, which is where they should be (it's not reasonable to expect someone to google everything, documentation or at last resort the program's messages should be clear in indicating everything that is needed for things not to break.

BTW, when I googled for this error, I found no reference that it could be caused by using rclone's built-in client ID, nor generally that bad things could happen in that case. Can you please indicate one of these "spots", so we can record it here?

First step for that is opening an issue on github; I just did, here it is: GoogleDrive: Warn user about using built-in client_id · Issue #6776 · rclone/rclone · GitHub

IMHO, that's a distinction without a difference: as (per your own admission) we can't control people abusing the default Client ID, we should do what we can to avoid the program mysteriously breaking, and that is warning the user not to use it and instead create his own.

The "etc" part, as already demonstrated, is not clear at all. What I propose (and have already opened an issue for it) is exactly to make it clear.

I'm not a Golang programmer nor I understand the minutia of modifying rclone's documentation, but as you already made it clear you don't consider this to be a problem, I just might take a shot at it.

HMB... :slight_smile:

That's where is strange, I can't prove it was the ID but if the ID ran out of auths, I'm assuming that 'could' be it based on an assumption as I can't see any of the default rclone ID data to see.

I mean, it says setting your own is recommended in the config steps and leaving it blank will be low performance. The potential auth thing is a new one based on the abuse (I believe).

I've learned even in bold text, flashing ,etc, lots of folks will delete it and ignore it anyway as you can't help everyone.

I'm definitely not debating it can be improved as that's always the case so was not trying to sound that way if I was.

Me either as I don't program at all. I don't use Google Drive at all so if folks that use it, don't want to update it either, it isn't important for me (that's my attempt at humor so hopefully it translates via text).

Give it a shot and if you have questions, lots of folks can guide as teaching you to fish means more fish for all of us :slight_smile:

It just finished, with no errors and therefore no retries. I think we can all consider the problem, if not solved, at least worked around.

Thanks again to everyone who helped!

I don't think using the rclone auth should actually break anything.

However I conjecture what happened with you is that you got the initial token and refresh token with your custom auth, but then rclone tried to refresh it without the custom auth. That fits the symptoms of the 401 errors exactly.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.