OneDrive: rclone getting throttled and pacer sleep time skyrockets to >5m

Hi Francesco,

The sleep times you see are due to rclone respecting the Retry-After time returned in the HTTP 429 (Too many requests) responses from OneDrive. There is no way to avoid them for sure, they depend on many parameters including the current load on the OneDrive servers hosting your data.

You can see it being received and handled in this line of your log:

2021-10-02 18:56:48 DEBUG : Too many requests. Trying again in 322 seconds.

You can however find some tips and experiences to reduce the throttling in this thread:

And read more about OneDrive throttling in this article:

On a side note to illustrate the sporadic nature of the OneDrive throttling: I am looking at ways to improve the pacer and my current problem is that I am currently unable to provoke the HTTP 429 responses, despite very heavy load (disrespecting all of the recommendations in the above thread).

1 Like

Thinking about it, perhaps I can use your situation to provoke the HTTP 429’s.

What are the characteristics of your data? Is it something that I can recreate - e.g. a program or git repository I can download?

What other activities did you perform on your OneDrive account in the 24 hour period before performing the test?

hi,

i have no issues triggering the HTTP/2.0 429 Too Many Requests, even on a sunday morning.

  • use personal not business
  • no client id
  • rclone sync --transfers=100 --checkers=100 of 1,000,000 local files, each of size 1 byte.

i also get HTTP/2.0 503 Service Unavailable with a decode error.

2021/10/03 11:33:47 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/10/03 11:33:47 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/10/03 11:33:47 DEBUG : HTTP RESPONSE (req 0xc01c2c6300)
2021/10/03 11:33:47 DEBUG : HTTP/2.0 503 Service Unavailable
Content-Length: 974
Cache-Control: no-store
Content-Type: text/html
Date: Sun, 03 Oct 2021 15:33:46 GMT
X-Azure-Externalerror: 0x80072efe,OriginConnectionAborted
X-Msedge-Ref: Ref A: DE518F3B63BD40199EFA7A26C6632BC8 Ref B: EWR30EDGE0917 Ref C: 2021-10-03T15:33:28Z

2021/10/03 11:33:47 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/10/03 11:33:47 DEBUG : Couldn't decode error response: invalid character '<' looking for beginning of value
2021/10/03 11:33:47 DEBUG : pacer: low level retry 1/10 (error 503 Service Unavailable: )
2021/10/03 11:33:47 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2021/10/03 11:33:47 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Thanks @asdffdsa Good to know, I will try that.

I gave up after 280,000 files totaling 13GB without seeing a single 429, seems like I should have had the persistence to quadruple that.

I also see the occasional 503’s - they happen somewhat randomly and causes no real issues in my tests.

The decode error is due to OneDrive returning <nil> instead of a JSON value. I doesn’t seem to be a problem, because the JSON part of the response is ignored later based on the 503 response code.

Thank you for the prompt reply. I first saw it happen while doing an upload test with an entire old copy of the vtiger crm php site that you can download from the following link:
https://sourceforge.net/projects/vtigercrm/files/vtiger%20CRM%206.1.0/Core%20Product/vtigercrm6.1.0.tar.gz/download

After I noticed it when doing backups of gitlab installation data, which includes a lot of git repositories. Basically bare workset data or application assets, composed of small files, seem to trigger the throttling very easily in my scenario.

In the 24 hours before I began testing rclone uploads I was doing nothing special: I was testing an alternative onedrive client[1], downloading/uploading a small amount of data from the same account.

I will attempt to retry the test tomorrow following your suggestions and see if it changes anything.

[1] GitHub - abraunegg/onedrive: #1 Free OneDrive Client for Linux

Thanks, good info!

I am curious to see the location of your OneDrive datacentre. Can you perform this command:

 rclone lsd onedriveitr: -vv --dump headers

And then just post the first line from the log looking like this:

X-Ms-Ags-Diagnostic: {"ServerInfo":{"DataCenter":"West Europe","Slice":"E","Ring":"5","ScaleUnit":"002","RoleInstance":"AM2PEPF00007A48"}}

A quick peek in the code of the abraunegg onedrive client indicates that it handles 429 (and 503) somewhat similar to rclone - and it seems to report them in the log.

If you see the 429 (Too many requests) responses in rlcone again tomorrow, then it would be interesting to know if they are also reported by the abraunegg client - and result in similar slow downs.

Sure, I have two, I will post both in order of appearance:

X-Ms-Ags-Diagnostic: {"ServerInfo":{"DataCenter":"North Europe","Slice":"E","Ring":"4","ScaleUnit":"003","RoleInstance":"DB1PEPF00005962"}}
X-Ms-Ags-Diagnostic: {"ServerInfo":{"DataCenter":"North Europe","Slice":"E","Ring":"4","ScaleUnit":"002","RoleInstance":"DU2PEPF000070BF"}}

Still related to the topic, I'm trying to resync a location which has not changed, both locally and remotely and I have a question. Obviously in this scenario no real data transfer should occur and only metadata with checksums/timestamps/sizes should be exchanged. Still, in my tests, I'm easily throttled as well. Visually by the observing the verbose log it seems rclone is querying the remote state of each file one by one. Is this interpretation correct or rclone attempts to group the queries of several files and reduce API calls? Sorry if I'm doing wild guesses/wrong interpretations.

Don't know the abarunegg client, but it may have scanned you entire OneDrive and repeated that in fixed intervals. If you have many files in your OneDrive then that may have triggered the throttling (or brought it near before you tested rclone). I therefore suggest you unmount it during the cool down (the next 24 hours) until trying rclone.

I am pretty sure it only lists the folders, but they may also be very demanding if your have many folders. Each folder will require a request. It can sometimes be optimized by --fast-list. But what you are seeing right now can't be explained by any of the this. I would like to get the abraunegg client out of the equation and your OneDrive cooled down (min. 24 hours) to get a clear picture.

Doesn’t RClone do the equivalent of MD5 for each file in question somehow? With S3, which I know you aren’t using, I think it gets this from meta data somehow.

Forgetting RClone, I have gotten bitten multiple times with throttling by Google so agree on removing as many differences as possible.

This is probably a real bad idea, but should we give RClone users the ability to specific the longest sleep time they want when getting throttled. Obviously, with strong warnings that setting this too low could cause an apparent DOS attack and might get the user’s Google account locked and/or terminated. Just don’t allow a time less than some predefined number you all use.

I'd probably keep the thread to OneDrive rather than trying to mix things up as OneDrive's throttling is a bit nastier.

You can already do that with Google Drive there is an exponential back off algorithm (I'm sure that's for other providers as well but I only use Google myself). It's best to just leave the defaults as is and all what you've stated is already taken care of.

You don't get your account removed for hitting the API too hard as you just get 403s and you'll get rate limited / asked to back off.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

First, sorry for the very late reply. I had to request re-opening of the post to be able to reply here, since it was automatically closed, but I really wanted to post my final feedback and keep it close to the original report. To summarize: I had these OneDrive sync operations that triggered throttling very easily, and the amount of throttling was unsustainable (>5 minutes). To add one more detail that maybe was not better specified, I'm using a OneDrive for Business account and the company uses some SharePoint features, such as a SharePoint drive/link added in the root of "My Files" as a shortcut, but I'm not currently writing/reading from this folder. Still, looking to other feedbacks it seems throttling is often linked to use SharePoint features. Following the suggestions of commenters and rclone guides I:

  • Set up correctly a client ID and key as explained in the guide;
  • Attempted to create different user agent combinations, such as ISV vs NONISV, AppName(s) as explained by Microsoft. Still Microsoft doesn't explain very clearly what these attributes should be.

After all these attempts and upgrade to latest 1.57 version of rclone, I was still experiencing severe throttling, confirming that the issue is reproducible in my scenario and not related to use of other unofficial clients. Fortunately, I stumbled upon this post, that basically reduces the amount of concurrent transactions, and the overall rate of them. In particular I tried various combinations of --checkers. I tried to reduce it to 4, 2 and I always was able to reproduce the throttling. Finally I set it to 1 and the throttling finally stopped arising. The bandwidth performance is greatly reduced but at least the connection doesn't unpredictably hang anymore.

In conclusion, in my scenario (which may be a combination of the type of account, the exact infrastructure and kind of the cloud service being used by the company, the type of the data being synced, the use of some particular European regional server and other variables) any use of concurrent/parallel transactions decided by the --checkers options in rclone is strictly forbidden, no matter what else I do to better decorate the connection. I still have the feeling something is wrong in rclone with regard to OneDrive number of generated transactions (which seems to be huge and not optimal in case of synchronization which huge number of flies and folders), but I'm satisfied enough with the result achieved. I would love to hear if there are updates with regard to better handling of throttling/reduction of number of transactions in rclone with OneDrive, even if I can't promise I will be able to test them promptly, similarly to the delay in posting my feedback with regard to the synchronization task I was implementing.

2 Likes

Thank you, your observations are very much appreciated and very helpful :blush:

I have since been able to provoke a similar behavior on my OneDrive Personal Premium and can confirm that I also didn’t see an effect of personal Client ID or user agent.

It is my observation that the message “429/Too many requests” really means “too many requests waiting in queue at the server” and the solution is to gradually reduce the number of parallel connections/requests used by rclone until it fits the current wishes of the server (think a dynamic reduction of --checkers and --transfers).

I typically see a stabilization when connections have been reduced to 12 which corresponds to the current defaults (--checkers=8 --transfers=4). Your observations clearly shows that there isn’t just a single optimum and sometimes the connections/checkers need to be reduced to minimum (and stay there). This is very good to know!

I have an rclone prototype that can do most of this and some other pacing optimizations, but it still need some work to be ready for beta test. Timeframe: I am struggling to find time just like you.

I know that it sometimes feels slow, and is painfully slow in your situation, but are you able to pinpoint why you have a feeling that rclone performs too many requests/transactions?

Have you
• tried --fast-list?
• traced the rclone communication with --dump bodies?
• experienced a sync tool that did the same, but faster?

The important thing to understand about OneDrive, Google Drive and similar is that they are priced per size only and therefore come with limitations to the speed and number of requests you can do within an hour/day (that is fair usage limitations that are enforced by some kind of rate limiting or throttling).

If you want unlimited speed and up/download capacity, then you typically get to pay per request (one way or the other) - think Microsoft Azure or similar.

did a quick test with wasabi.
rclone sync d:\files\10000x20MB wasabi01:backupdirtest/10000x20MB --checkers=256 --transfers=256

Transferred:   	  200.005 MiB / 200.005 MiB, 100%, 40.002 MiB/s, ETA 0s
Transferred:        10000 / 10000, 100%
Elapsed time:         5.9s

This is just a feeling because you explained me that rclone has (really?) no better way to sync the whole tree than querying recursively (and concurrently) for folders content. Really there's not a single API to download the whole index starting from a location, with the server optimizing the transfer as much as possible (file names are highly compressible) and that is free of throttling by definition because the server also decides a paging strategy? Sure: this single API call can't be made parallel, but it seems to me a better strategy for me for this kind of unidirectional synchronizations. When the index is fully downloaded, then you can do any amount of needed transfers concurrently. This is just a speculation of mines: you (rclone devs) may already have decided this is not a good strategy because of how modern clouds work. But as a matter of fact I notice throttling also when syncing unchanged trees, where the amount of actual file data transfers is (and will be) 0 bytes.

Have you
• tried --fast-list?

This was one of the first tries I did and didn't seem to help me. Sorry, I haven't tried it more recently.

• traced the rclone communication with --dump bodies?

Sorry, no. Should I?

• experienced a sync tool that did the same, but faster?

When syncing some large whole tree accounts from scratch with the official OneDrive client in the company don't seems to hang the way rclone did for me, but I can't inspect them the same way, so I can't say for true.

This may help. Reducing dynamically (and gradually) the concurrent transfers to as low as 1 (no concurrent transfers) as a result of being throttled may be a key requirement in surviving some aggressive OneDrive throttling policies.

hi, not sure your exact use case but as a possible workaround.

reduce the number of api calls by use filtering; tho there can be caveats.
--max-age 24 - would only sync files where the source file modtime is older than 24 hours.
if the source file modtime is older than 24 hours, rclone will not check onedrive and thus not use api calls.

Correct, there is no better way to do it - and it can be lightning fast as you see in the above example posted by Jojo.

The native Windows OneDrive Client does the same as rclone when it syncs for the first time and at approximately the same speed – and it may get throttled too. Pay attention to the number next to “Processing changes” in your OneDrive Client extended view – the throttling is quite easy to spot once you know the pattern to look for. The client and File Explorer will remain responsive, but content will be out of sync, thus higher risk of sync conflicts.

I got myself into trouble when I did a series of aggressive rclone throttling tests, because my Windows OneDrive and iOS OneDrive got throttled too – almost blocking me from cross device file synchronization in a couple of days…

The limit isn’t due to the technical design of OneDrive or rclone, both can support speeds like the above example from Wasabi.

I see the rate limit/throttling as a purely commercial decision, to keep the price of OneDrive (the server park) down and push the most demanding (professional/power) users over to other more expensive (and better suited) alternatives. It is the same concept for Dropbox, Google Drive, etc.

If you do a first-time sync with a lot of small files, then you hit a (somewhat dynamic) limit on file creations per second. There is a similar limit on deletions.

If you do a first-time sync with large files, then you hit another (somewhat dynamic) limit on bytes uploaded per second. There is a similar limit of downloads.

If you do a sync without changes, then you hit yet another (somewhat dynamic) limit on directory content lookups per second.

OneDrive enforces the limits by slowing down the responses to you, which sometimes result in a queue of requests building up at their servers. If this queue is too long for too long time, then OneDrive tells all the clients connected to that account (or tenant) to stop sending further requests for a while. This is the “429/Too Many request” response you have seen.

It is worth a try, but the outcome heavily depends on the characteristics of your data and the balance between checking and transferring.

It has no positive effect for my daily usage.

No, unless you understand the OneDrive REST API in depth and really want to check if rclone is sending optimal requests.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.