Suddenly Seeing rateLimitExceeded constantly


#1

I have two machines making use of a Google Drive backend.

Machine A lives in a data center and prepares files locally and then uses rclone to copy them to Drive.
Machine B mounts the drive locally at home.

These two machines have been running cron jobs happily for at least 18 months with no hiccups.

Over the weekend I notice that an rclone job that syncs the Drive to another Drive (a backup mirror) had been hung since November.

I killed that job and tried restarting the sync, but I get 403 Rate Limited Exceeded errors constantly and the transfer rate sits at 0. I stopped the cron job to give it a rest (e.g., if there is some kind of 24 ban going on).

But now Machine A simply cannot upload anything to Drive.
The instant the transfer starts, the log fills up with:

2019/02/11 11:16:13 DEBUG : rclone: Version "v1.46.0-006-gef5e1909-beta" starting with parameters ["rclone" "copy" "-c" "--stats" "3m" "--tpslimit" "3" "--transfers" "3" "--files-from=/tmp/files" "-vv" "/mnt/media/Drive" "GoogleDrive:"]
2019/02/11 11:16:13 DEBUG : Using config file from "/home/chezmojo/.config/rclone/rclone.conf"
2019/02/11 11:16:13 INFO  : Starting HTTP transaction limiter: max 3 transactions/s with burst 1
2019/02/11 11:16:13 DEBUG : pacer: Rate limited, sleeping for 1.499745568s (1 consecutive low level retries)
2019/02/11 11:16:13 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:14 DEBUG : pacer: Rate limited, sleeping for 2.234574878s (2 consecutive low level retries)
2019/02/11 11:16:14 DEBUG : pacer: low level retry 2/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:15 DEBUG : pacer: Rate limited, sleeping for 4.7689464s (3 consecutive low level retries)
2019/02/11 11:16:15 DEBUG : pacer: low level retry 3/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:17 DEBUG : pacer: Resetting sleep to minimum 100ms on success
2019/02/11 11:16:22 DEBUG : pacer: Rate limited, sleeping for 1.230433586s (1 consecutive low level retries)
2019/02/11 11:16:22 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:22 DEBUG : pacer: Rate limited, sleeping for 2.932044098s (2 consecutive low level retries)
2019/02/11 11:16:22 DEBUG : pacer: low level retry 2/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:23 DEBUG : pacer: Rate limited, sleeping for 4.806679986s (3 consecutive low level retries)
2019/02/11 11:16:23 DEBUG : pacer: low level retry 3/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2019/02/11 11:16:26 DEBUG : pacer: Resetting sleep to minimum 100ms on success

The log just goes on and one like this for hours with a transfer rate of 0:

Transferred:             0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 0 / 0, -
Transferred:            0 / 0, -
Elapsed time:      3h6m1s

If I run the same command against a different Drive the log fills up with the same errors, but then actually manages to transfer files after a while.

I created a new API key and refreshed the tokens to effect.
I updated to the latest rclone Beta to no effect.

Machine B is doing fine — no errors in the log (for the mount).
And the sync job is now running fine on another machine.

Just to be clear, I changed absolutely nothing about my setup. These 403 errors cropped up suddenly and without any obvious proximate cause.

Is anyone else running into anything similar?


Using --files-from with Drive hammers the API
#2

It seems like adding --no-traverse makes a difference. Very slow transfer rates, but at least files are actually uploading now.


#3

What’s your actual sync command?


#4

Sorry, it was buried in the debug log.

The exact, same command with --no-traverse results in files uploading, but with terrible (though non-zero) speeds.


#5

If you check your console, are you seeing any huge jumps on the quotas per 100 seconds if you check a day or so?


#6

I regenerated the API keys last night so I do not have a lot of statistics, but the peak is > 600.


#7

So I’m not sure why it would creep up all the sudden, but I try to limit my transfers/checkers as the goal is to keep under 10 transactions per second so I do something like:

/usr/bin/rclone move /data/local/ gcrypt: -P --checkers 3 --log-file /home/felix/logs/upload.log -v --tpslimit 3 --transfers 3 --drive-chunk-size 32M --exclude-from /home/felix/scripts/excludes --delete-empty-src-dirs

Depending on how many things you run at the same time, you might want to reduce a little and see how it works out.

Did the # of files or something start to ramp up perhaps? It sounds like you crossed some limit creating more hits.


#8

There are certainly a ton more files in the queue since it’s ben about three days that I haven’t been able to upload anything.

I haven’t tried limited the checkers… will give that a shot.
Right now there should be only one instance of rclone anywhere trying to write to Drive.
Seeing the numbers from your developer console, there is clearly something wrong with mine!


#9

I don’t sync and just copy a few files each night at most so I’m probably not the best use case for checking the console so definitely don’t compare to me. I just showing an example.

I’m just a plex user running a few streams each night and moving usually 5-10 files overnight daily.


#10

Same here except that mirror the Drive so that I can keep a mirror mounted on the server and the “real” one mounted at home.

That is what is so confusing… everything just kind of went sideways for no reason.


#11

Weirder still, it suddenly started throwing “user rate limited exceeded” errors, suggesting that I rethink my API quotas even though there is no evidence of anything crossing the user limit in the developer console.

I killed rclone and tried again and it said “unauthorized”… I had to regenerate the token.

Whatever is going wrong, I think it is happening on Google’s side and not mine.


#12

Hmm. Trying to think of other things.

Is the time right on the server? That can get the keys not refreshing properly.

Or write access on rclone.conf if it can’t update the key.

I had my time off on my test VM and it wouldn’t regenerate the key.


#13

Time is ok.
It’s actually a dedicated server (uptime just hit 400 days) so settings tend to stay put.

I went through all of my servers and killed all instances of rclone and then started the upload again.
It is currently working with --no-traverse, but it choked on all the little nfo files, so I just deleted them.
At least this way I can clear the backlog… I hope.

The developer console is still showing spikes > 400 queries per second though.