Box token refresh timing


#1

Box access tokens are automatically invalidated as soon as the refresh token that created them are used to get a newer access timing. For this reason, if I have a process uploading a long list of files and another process starts up and decides it is time to refresh the access token, the access token being used by the first process will suddenly stop working int he middle.

Does rclone have a mechanism to coordinate this so that the 2 processes will synchronize the switch from the older token to the newer one? If not, since I already have a cron job that uses 2 Box client IDs to alternately refresh 2 sets of tokens, is there any option like “–box-access-token=XXX” that I can pass to rclone to have it use the (freshest) token I already have and prevent it from asynchronously refreshing a token?


#2

Yes, that is unfortunate.

No it doesn’t :frowning:

I’d suggest making a new remote for each task. Each should have its own independent refresh token then - that should work hopefully.


#3

In my case, I have hundreds of tasks (Jenkins jobs) so that isn’t feasible. I may have to see if I can hack in an external-token or don’t renew option.


#4

It is a bit annoying the way box does this… None of the other cloud providers do this. It also means that copying the config file causes problems.

Do you have someone you could ask at box about this?

Actually we’ve been talking about making a --PROVIDER-token flag/config option…


#5

Hi Nick,

We’ve asked Box about this and they haven’t budged (and we are a major account). The workaround I use is to have a separate job whose sole task is to alternately refresh two tokens and have one available with at least half an hour life left on it. That works fine for my Perl-based box upload scripts, but they are not nearly as sophisticated as rclone so I’m hoping to abandon mine in favor of rclone.

-Joel


#6

Hi Nick,

Actually, I may have a better “path of least resistance” for this.

Currently, if I start a long recursive operation 5 minutes before the token expires, and the token expires during the operation, it will get renewed as soon as it is needed. However, if the token begins to fail during the operation, rclone will abort.

If, instead of aborting, rclone were to check the rclone.conf file for an updated token, then it would only need to retry one operation and could continue right through this.

Would that be a proper solution?

Thanks,

-Joel


#7

Interesting idea…

I had a go at implementing it here. I’m not 100% sure

  • it is working
  • it will work 100% of the time - reading things from the config file without file locking is racy!

https://beta.rclone.org/branch/v1.45-057-g71051d5e-fix-oauth-refresh-beta/ (uploaded in 15-30 mins)

Watch out for some extra debug logs (see with -vv)

fs.Debugf(ts.name, "Failed to read token out of config file: %v", err)
fs.Debugf(ts.name, "Failed to parse token out of config file: %v", err)
fs.Debugf(ts.name, "Loaded fresh token from config file")
fs.Debugf(ts.name, "Loaded invalid token from config file - ignoring")

#8

Unfortunately, without some kind of a lock, both instances tried to refresh.

I edited ~/.config/rclone/rclone.conf to make the current token expire about 4 minutes in the future (so I wouldnt have to wait an hour to test)
I started 2 instances of “ls remote:” which would take a long time to run.

the first renewed the token…

2019/01/04 13:20:16 DEBUG : pacer: Reducing sleep to 22.806972ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 17.105229ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 12.828921ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 10ms
2019/01/04 13:20:17 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/04 13:20:18 DEBUG : boxtest123: Saved new token in config file
2019/01/04 13:20:19 DEBUG : box root ‘’: Ignoring “FAE Slides and Sales Slides” - unknown type “web_link”

The second looks like it attempted to also refresh, but the first had already done so and that kept the second from succeeding.

2019/01/04 13:20:14 DEBUG : pacer: low level retry 2/10 (error Error “rate_limit_exceeded” (429): Request rate limit exceeded, please try again later)
2019/01/04 13:20:14 DEBUG : pacer: low level retry 2/10 (error Error “rate_limit_exceeded” (429): Request rate limit exceeded, please try again later)
2019/01/04 13:20:15 DEBUG : pacer: Reducing sleep to 1.5s
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 1.125s
2019/01/04 13:20:18 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/04 13:20:19 DEBUG : pacer: Reducing sleep to 843.75ms
2019/01/04 13:20:19 DEBUG : boxtest123: Loaded fresh token from config file
2019/01/04 13:20:19 DEBUG : boxtest123: Saved new token in config file
2019/01/04 13:20:19 DEBUG : pacer: Reducing sleep to 632.8125ms
2019/01/04 13:20:20 DEBUG : pacer: Reducing sleep to 474.609375ms
2019/01/04 13:20:21 DEBUG : pacer: Reducing sleep to 355.957031ms
2019/01/04 13:20:21 DEBUG : pacer: Reducing sleep to 266.967773ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 200.225829ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 150.169371ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 112.627028ms
-bash-4.1$ tail -4 typescript
2019/01/04 13:20:22 Failed to ls: couldn’t list files: Get https://api.box.com/2.0/folders/46000087070/items?fields=type%2Cid%2Csequence_id%2Cetag%2Csha1%2Cname%2Csize%2Ccreated_at%2Cmodified_at%2Ccontent_created_at%2Ccontent_modified_at%2Citem_status%2Cshared_link&limit=1000&offset=0: oauth2: cannot fetch token: 400 Bad Request
Response: {“error”:“invalid_grant”,“error_description”:“Refresh token has expired”}


#9

:frowning:

Hmm, something unexpected happened there. It looks like two threads tried to refresh the token. “2a” failed but “2b” succeeded. Unfortunately “2a” failing stopped the listing.

1>  2019/01/04 13:20:17 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2a> 2019/01/04 13:20:18 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
1>  2019/01/04 13:20:18 DEBUG : boxtest123: Saved new token in config file
2b> 2019/01/04 13:20:19 DEBUG : boxtest123: Loaded fresh token from config file
2b> 2019/01/04 13:20:19 DEBUG : boxtest123: Saved new token in config file

I can think of several ideas for solving this

  • file locking
  • making the oauth2: cannot fetch token: 400 Bad Request a retriable error
  • adding a random offset to the time to see if the token needs renewing or not for each thread

Any other ideas occur to you?


#10

That’s where the real fun with Box’s oauth implementation comes in. If A refreshes the token, B can’t use the same refresh token either.

I think the thing to do is, if the current auth token fails or expires, attempt to renew. But, if the renew attempt fails, before retrying the renew attempt (which will either fail again or succedd and cause A to fail a moment later), read the token from the config file and see if that auth token now works.

So…

A and B are using a token
A reaches expire time and renews (gets new auth token and new refresh token)
B still using old token sees token fail (or sees time expire … race between these 2 situations)
B attempts to renew (fails because the refresh token was already replaced by A)
B rereads the token from the config file (after a small delay) and sees it is valid
B continues using the new token without attempting to renew as long as token stays valid

If both A and B attempt to renew at the same moment, Box will let one succeed (and that one will write to the config file) and the other will fail (and should not modify the config file) so box itself is acting as a lock.

The only problems I see with this are

  • B needs to wait long enough before giving up on the config file to be sure that A has finished writing it.
  • If the refresh fails for other reasons, B may want to re-try
    so the algorithm may be to try the refresh/reload cycle a few times before giving up

while (expired and retries < max) {
Attempt refresh
If refresh fails, reload from config
retries++
}


#11

Hmm, retries… Rclone is built on retries so that seems like a plausible solution!

Here is a second attempt which does pretty much the same as the first attempt, except it will retry 5 times with a 1 second sleep between the tries.

Can you give that a go?

https://beta.rclone.org/branch/v1.45-057-g32a4c45b-fix-oauth-refresh-beta/ (uploaded in 15-30
mins)


#12

That one worked better. In my test where I fudge the renew time to make it renew early…

2019/01/06 07:35:43 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/06 07:35:43 DEBUG : boxtest123: Token refresh failed try 1/5: oauth2: cannot fetch token: 400 Bad Request
2019/01/06 07:35:44 DEBUG : boxtest123: Loaded fresh token from config file
2019/01/06 07:35:44 DEBUG : boxtest123: Saved new token in config file

and

2019/01/06 07:35:42 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/06 07:35:43 DEBUG : boxtest123: Saved new token in config file


#13

Also tried an experiment where I lied and made the token expiration in the conf file last longer than Box actually thinks it should. (This simulates a situation where the clock at Box isn’t precisely synced to the clock on the client) I would expect that Box would suddenly return “unauthorized” responses and that would trigger a renewal. However, it doesn’t seem to…

2019/01/06 08:40:23 DEBUG : Couldn’t decode error response: EOF
2019/01/06 08:40:23 DEBUG : pacer: Reducing sleep to 266.967773ms
2019/01/06 08:40:23 ERROR : Training xxourced Training Materials/CUSTOMER: error listing: couldn’t list files: Error “401 Unauthorized” (401)
2019/01/06 08:40:23 DEBUG : Couldn’t decode error response: EOF
2019/01/06 08:40:23 DEBUG : pacer: Reducing sleep to 200.225829ms
2019/01/06 08:40:23 ERROR : xxxxxuary-04: error listing: couldn’t list files: Error “401 Unauthorized” (401)
2019/01/06 08:40:23 DEBUG : Couldn’t decode error response: EOF
2019/01/06 08:40:23 DEBUG : pacer: Reducing sleep to 150.169371ms
2019/01/06 08:40:23 ERROR : xxx2017/201701-January/201701-January-11: error listing: couldn’t list files: Error “401 Unauthorized” (401)
2019/01/06 08:40:24 Failed to ls: couldn’t list files: Error “401 Unauthorized” (401)

and then dies. Shouldn’t it make a final attempt to reload (and failing that, renew) the token before giving up?


#14

That is good - that looks like it was working as intended.

Do you want me to merge this now?

I just traced through the oauth library code and it looks like it trusts the time on the token. Nowhere in the code do I see it responding to 401 errors either.

So given that, what you see is what I’d expect…

That may or may not be correct behaviour according to the RFC I don’t know!

Note that the oauth library adds a 10 second grace period to the token refresh…

Note also that rclone will return errors in the log if it detects the server time is a long way from the local computer time too!


#15

That makes sense. The 10 second grace should do the trick.

The one thing I haven’t tried (yet) is to have a series of large chunks uploading slowly (bandwidth limited, so each chunk takes at least 30 seconds) while (a) another process renews the token or (b) the original token expires. I’m hoping that Box will continue to accept the token that was valid at the beginning of a transfer even if that transfer completes after the token has expired.

The change you have so far seems to help the original issue, so it would be great if you are ready to merge it and I’ll continue my integration and testing.


#16

I’m pretty sure box will check the token right at the start - it doesn’t make sense to accept the data, then reject it.

I’ve merged that into the latest beta it will be there in 15-30 mins and be released in v1.46 on 7th Feb.


#17

Hi I wanted to chime in with a question, am I reading this correctly that one config file can be used for multiple transfers to box? only because I started working with the box cli, if I can specify the token on the cli as per original post? thanks


#18

Yes it can now.

No there isn’t a --box-token command line flag which there probably ought to be. You’ll have to write a config file if you want to use that