Box access tokens are automatically invalidated as soon as the refresh token that created them are used to get a newer access timing. For this reason, if I have a process uploading a long list of files and another process starts up and decides it is time to refresh the access token, the access token being used by the first process will suddenly stop working int he middle.
Does rclone have a mechanism to coordinate this so that the 2 processes will synchronize the switch from the older token to the newer one? If not, since I already have a cron job that uses 2 Box client IDs to alternately refresh 2 sets of tokens, is there any option like ââbox-access-token=XXXâ that I can pass to rclone to have it use the (freshest) token I already have and prevent it from asynchronously refreshing a token?
In my case, I have hundreds of tasks (Jenkins jobs) so that isnât feasible. I may have to see if I can hack in an external-token or donât renew option.
Weâve asked Box about this and they havenât budged (and we are a major account). The workaround I use is to have a separate job whose sole task is to alternately refresh two tokens and have one available with at least half an hour life left on it. That works fine for my Perl-based box upload scripts, but they are not nearly as sophisticated as rclone so Iâm hoping to abandon mine in favor of rclone.
Actually, I may have a better âpath of least resistanceâ for this.
Currently, if I start a long recursive operation 5 minutes before the token expires, and the token expires during the operation, it will get renewed as soon as it is needed. However, if the token begins to fail during the operation, rclone will abort.
If, instead of aborting, rclone were to check the rclone.conf file for an updated token, then it would only need to retry one operation and could continue right through this.
Watch out for some extra debug logs (see with -vv)
fs.Debugf(ts.name, "Failed to read token out of config file: %v", err)
fs.Debugf(ts.name, "Failed to parse token out of config file: %v", err)
fs.Debugf(ts.name, "Loaded fresh token from config file")
fs.Debugf(ts.name, "Loaded invalid token from config file - ignoring")
Unfortunately, without some kind of a lock, both instances tried to refresh.
I edited ~/.config/rclone/rclone.conf to make the current token expire about 4 minutes in the future (so I wouldnt have to wait an hour to test)
I started 2 instances of âls remote:â which would take a long time to run.
the first renewed the tokenâŚ
2019/01/04 13:20:16 DEBUG : pacer: Reducing sleep to 22.806972ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 17.105229ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 12.828921ms
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 10ms
2019/01/04 13:20:17 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/04 13:20:18 DEBUG : boxtest123: Saved new token in config file
2019/01/04 13:20:19 DEBUG : box root ââ: Ignoring âFAE Slides and Sales Slidesâ - unknown type âweb_linkâ
The second looks like it attempted to also refresh, but the first had already done so and that kept the second from succeeding.
2019/01/04 13:20:14 DEBUG : pacer: low level retry 2/10 (error Error ârate_limit_exceededâ (429): Request rate limit exceeded, please try again later)
2019/01/04 13:20:14 DEBUG : pacer: low level retry 2/10 (error Error ârate_limit_exceededâ (429): Request rate limit exceeded, please try again later)
2019/01/04 13:20:15 DEBUG : pacer: Reducing sleep to 1.5s
2019/01/04 13:20:17 DEBUG : pacer: Reducing sleep to 1.125s
2019/01/04 13:20:18 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2019/01/04 13:20:19 DEBUG : pacer: Reducing sleep to 843.75ms
2019/01/04 13:20:19 DEBUG : boxtest123: Loaded fresh token from config file
2019/01/04 13:20:19 DEBUG : boxtest123: Saved new token in config file
2019/01/04 13:20:19 DEBUG : pacer: Reducing sleep to 632.8125ms
2019/01/04 13:20:20 DEBUG : pacer: Reducing sleep to 474.609375ms
2019/01/04 13:20:21 DEBUG : pacer: Reducing sleep to 355.957031ms
2019/01/04 13:20:21 DEBUG : pacer: Reducing sleep to 266.967773ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 200.225829ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 150.169371ms
2019/01/04 13:20:22 DEBUG : pacer: Reducing sleep to 112.627028ms
-bash-4.1$ tail -4 typescript
2019/01/04 13:20:22 Failed to ls: couldnât list files: Get https://api.box.com/2.0/folders/46000087070/items?fields=type%2Cid%2Csequence_id%2Cetag%2Csha1%2Cname%2Csize%2Ccreated_at%2Cmodified_at%2Ccontent_created_at%2Ccontent_modified_at%2Citem_status%2Cshared_link&limit=1000&offset=0: oauth2: cannot fetch token: 400 Bad Request
Response: {âerrorâ:âinvalid_grantâ,âerror_descriptionâ:âRefresh token has expiredâ}
Hmm, something unexpected happened there. It looks like two threads tried to refresh the token. â2aâ failed but â2bâ succeeded. Unfortunately â2aâ failing stopped the listing.
1> 2019/01/04 13:20:17 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
2a> 2019/01/04 13:20:18 DEBUG : boxtest123: Loaded invalid token from config file - ignoring
1> 2019/01/04 13:20:18 DEBUG : boxtest123: Saved new token in config file
2b> 2019/01/04 13:20:19 DEBUG : boxtest123: Loaded fresh token from config file
2b> 2019/01/04 13:20:19 DEBUG : boxtest123: Saved new token in config file
I can think of several ideas for solving this
file locking
making the oauth2: cannot fetch token: 400 Bad Request a retriable error
adding a random offset to the time to see if the token needs renewing or not for each thread
Thatâs where the real fun with Boxâs oauth implementation comes in. If A refreshes the token, B canât use the same refresh token either.
I think the thing to do is, if the current auth token fails or expires, attempt to renew. But, if the renew attempt fails, before retrying the renew attempt (which will either fail again or succedd and cause A to fail a moment later), read the token from the config file and see if that auth token now works.
SoâŚ
A and B are using a token
A reaches expire time and renews (gets new auth token and new refresh token)
B still using old token sees token fail (or sees time expire ⌠race between these 2 situations)
B attempts to renew (fails because the refresh token was already replaced by A)
B rereads the token from the config file (after a small delay) and sees it is valid
B continues using the new token without attempting to renew as long as token stays valid
If both A and B attempt to renew at the same moment, Box will let one succeed (and that one will write to the config file) and the other will fail (and should not modify the config file) so box itself is acting as a lock.
The only problems I see with this are
B needs to wait long enough before giving up on the config file to be sure that A has finished writing it.
If the refresh fails for other reasons, B may want to re-try
so the algorithm may be to try the refresh/reload cycle a few times before giving up
while (expired and retries < max) {
Attempt refresh
If refresh fails, reload from config
retries++
}
Also tried an experiment where I lied and made the token expiration in the conf file last longer than Box actually thinks it should. (This simulates a situation where the clock at Box isnât precisely synced to the clock on the client) I would expect that Box would suddenly return âunauthorizedâ responses and that would trigger a renewal. However, it doesnât seem toâŚ
That is good - that looks like it was working as intended.
Do you want me to merge this now?
I just traced through the oauth library code and it looks like it trusts the time on the token. Nowhere in the code do I see it responding to 401 errors either.
So given that, what you see is what Iâd expectâŚ
That may or may not be correct behaviour according to the RFC I donât know!
Note that the oauth library adds a 10 second grace period to the token refreshâŚ
Note also that rclone will return errors in the log if it detects the server time is a long way from the local computer time too!
That makes sense. The 10 second grace should do the trick.
The one thing I havenât tried (yet) is to have a series of large chunks uploading slowly (bandwidth limited, so each chunk takes at least 30 seconds) while (a) another process renews the token or (b) the original token expires. Iâm hoping that Box will continue to accept the token that was valid at the beginning of a transfer even if that transfer completes after the token has expired.
The change you have so far seems to help the original issue, so it would be great if you are ready to merge it and Iâll continue my integration and testing.
Hi I wanted to chime in with a question, am I reading this correctly that one config file can be used for multiple transfers to box? only because I started working with the box cli, if I can specify the token on the cli as per original post? thanks