Google Drive quota reached seemingly linked to folder scanning

jpat0000 · November 17, 2016, 8:23pm

Every time I have reached a download quota, it has been pretty hard to say what the cause is because I had in fact been downloading a lot of content each time, but what I was also doing was scanning tons of folders with Sonarr, and refreshing my large Plex library as new content gets added, with an rclone mount.

I have had a few days where I have downloaded as much or significantly more content than previous problematic days, and had no issue, the link that I see, is that on these occasions, I had not been refreshing Plex much. One day, I exclusively tried to analyze a large folder of media in Plex (this creates the metadata), and I reached my quota before it was half way done. When the quota was lifted, I attempted to use node-gdrive-fuse to analyze all the media instead. This attempt was successful, and completed without any issues.

node-gdrive-fuse builds a database containing the entire file structure of your Google Drive, I have 20,000+ items, and it completes in under 10 seconds. All of the directories are stored in the database, and node-gdrive-fuse never checks folders to see if there is new content available, instead it looks at the changelog on Google Drive periodically, and adds new items to the database.

There are limitations with node-gdrive-fuse which is why I ultimately switched to rclone, first, if you move an already existing directory on Google Drive, you must rebuild the database to reflect the change, cached data cleanup extremely poor, and writing to google drive using a node-gdrive-fuse mount is a mess (biggest issue honestly).

Today, I have once again reached a quota on my Google Drive account, however this time, I have not downloaded more than ~40GBs, I have done a bit of folder scanning to reflect data that’s been uploaded. Also, I reached my quota in the middle of Plex refreshing a library.

Looking at Google’s API manager, I can see that there were ~5,000 total queries. This does not come close to the maximum quota, and the queries per second seems to be safely under 1,000 per 100 seconds, so I am not sure if I am drawing the wrong connection, or there is some undisclosed quota related to requesting what files are in a folder.

ncw · November 18, 2016, 5:00pm

Are you using the default application id that is built into rclone?

It might be that when you swapped to node-gdrive-fuse you swapped application id and had some more quota with that app.

My opinion is that there are lots of secret quotas in drive. For instance there is a total queries per second limit for every app which isn't visible in the developer console.

jpat0000 · November 18, 2016, 9:16pm

I was at first using the default application ID, then I setup my own to monitor api usage in order to try and determine what the issue might be. I am using the default application ID for node-gdrive-fuse, one could speculate that they perhaps have some largely untapped quota.

I see in Google Cloud’s API manager, that the quota for the API which I setup is set to 1,000 queries per 100 seconds.

Assuming a query is searching for a file, I don’t think node-gdrive-fuse uses much of any query quota. I can’t say exactly how node-gdrive-fuse fetches the list of all files/directories, but I assume it may be only one query.

I can’t say for sure what quota is being reached when using rclone, but it’s certainly not reaching a “download quota” as indicated by Google Drive, I just got done re-analyzing (generating metadate for files) my entire library, twice, using node-gdrive-fuse without a problem.

Is it possible rclone could have an option to cache the entire folder structure upon mounting and update the available files in a similar fashion as node-gdrive-fuse in the future?

Thanks!

ncw · November 19, 2016, 8:18am

It is certainly possible yes. However I'm not sure it would help you if you are effectively scanning every file.

jpat0000 · November 20, 2016, 10:57am

I only needed to scan every file in this instance, and it’s a note-worthy example for why the quota issue likely relates to the number of queries used (checking what files are available in a given folder), instead of, possibly, the number of files accessed/total data downloaded.

If rclone used a low number of queries to cache the entire drive structure, similar node-gdrive-fuse, I think that it would dramatically reduce the queries, limiting the total queries to the initial startup of the mount, and checking the Google Drive changelog, if that is considered a query.

Assuming all of the queries used for applications such as Plex to check for updated content results in the issue I am plagued with, I think this solution would help.

ncw · November 20, 2016, 11:46am

That could be an option to build a complete file system map on startup. rclone would store it in memory though - would that be a problem?

jkaberg · November 20, 2016, 4:14pm

I could live with that as long as it doesn’t exceed several GB’s (4+)

ncw · November 20, 2016, 5:21pm

It would be relatively easy to implement that. Do you fancy making an issue on github about it?

jpat0000 · November 20, 2016, 6:40pm

I have created an issue on Github here:

Please let me know if I am missing anything. In any case, thank you for considering my request.

yrh1 · November 20, 2016, 7:40pm

I also had issue with google drive quota while scanning with plex and took a while to figure out why I could not access my files anymore.
Is there a way that we can avoid issue with plex till ncw gets a chance to add the feature?
Thanks.

jpat0000 · November 20, 2016, 8:08pm

I am currently using node-gdrive-fuse, but I can't recommend it unless you're prepared to deal with a slew of issues. One issue was that all of the timestamps for files were not valid, when I did a refresh of a single section, my Plex database was corrupted, breaking the dashboard. I would say that the critical issues with node-gdrive-fuse are manageable, but it takes a bit of work.

Although using an extremely large --dir-cache-time with rclone would probably help, you're not going to be seeing new files, which I suppose is exactly the reason you would refresh the Plex library anyway.

So, basically no :S...

yrh1 · November 20, 2016, 8:16pm

Thanks for the reply. I will stay with rclone as I am using crypt with rclone. I will try larger dir cache time of maybe 1 hour and see if it avoids the issue. I guess I can wait an hour or two for files to appear in the plex.

aus · November 22, 2016, 6:58pm

Should rclone have a configurable client ID? (It’s hardcoded at the moment.) Other than monitoring the developer console, would there be any benefit to maintain your own client ID? Do rclone users share quota limits by using the same client ID?

ncw · November 23, 2016, 10:04am

rclone does have a configurable client ID - see: Google drive for info on how to make one and how to use it.

rclone users share a global queries per second limit. I regularly get the google drive team to increase it so you may or may not get better performance with your own client ID.

aus · November 23, 2016, 3:04pm

Wow, I totally missed that in the documentation. My bad. That method seems much easier. I was doing it the hard way by patching rcloneClientID and rcloneEncryptedClientSecret in drive/drive.go to point to my own clientID. Thanks for the clarification.

Performance hasn’t been too much of a problem. For me, I actually get better speeds out of Google than Amazon. However, I did manage to exceed some sort of quota on Google Drive (possibly by Plex folder scanning). In this state, trying to read any large file would immediately result in a 403 Forbidden error with no data sent back. I tried to reset this quota by changing my clientID. However, that had no effect. Ultimately, I just had to wait it out. It seemed to go back to normal after about 24 hours or so.

ncw · November 24, 2016, 10:48am

Interesting! I've found amazon to be extremely fast and google drive less so. Might be because I'm in the UK though.

That is a common experience with drive users.