Failed to dedupe - Rate Limit Exceeded


#1

Hi there,

We’ve been having great success with rclone to copy an archive of data to Google Drive, however we have run into a problem. When trying to run rclone dedupe on a massive upload we have finished, we get the following error;

Failed to dedupe: find duplicate dirs: couldn’t list directory: googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded

We rerun the command and it will usually stops on a different directory, but will still stop. When copying the rate limit error doesn’t seem to effect things, however with dedupe it just seems to stop the process. Is there any way to make this continue? If we keep running the dedupe command will it eventually get through all the directories?


#2

If you run the command with -vv you should see that it runs through --low-level-retries attempts to read that directory.

You can try raising --low-level-retries but I suspect that won’t help… What will help is to experiment with --tpslimit, try --tpslimit 1 to start with. This will slow things down a lot but should stop drive giving the errors.

It is mostly deterministic so it will probably get the rate limit error before it has done all the directories.

You can also start it at a subdirectory so get it to scan a limited part of the archive.


#3

Thanks we’ll try that. At first glance, we’re still getting rate limit errors, but the low level retry doesn’t seem to get past 2/10 so it looks like it should get all the way through. Thanks again!


#4

Just a quick follow up - using rclone with

--tpslimit 1

fixed our rate limit issues when running --dedupe

Thanks so much for the instruction.


#5

I was having this same problem, what is the normal TPS for Google Drive?

Also I tried looking at the developer console for my API and the error rate did not tell me the API method giving the error, using rclone dedupe -vv never shows what kind of HTTP call is being done to Google Drive API.

is there any way to make rclone even more verbose than -vv to explicitly output the http calls to the API in the console?

If that is not possible is there a recommended way to capture the traffic/decrypt the TLS by using a 3rd party certificate/man-in-the-middle or how would you troubleshoot this further?

Thanks!


#6

It’s in the console:

I personally reduce my transactions per second down to 2-3 to prevent any API issues.