Performance degradation with more than 100k files on Google Drive?

I've heard rumors, but nothing concrete, that a Google Drive (enterprise shared drive) degrades in performance once you have more than 100,000 files in it.

Has anyone experienced this? If so, have you been able to measure the performance consequence in any way?

Is the 100,000 number a cliff? Or does performance just begin to slowly degrade at this file count in a smooth manner, up to the hard limit of 400,000 files? E.g., would the performance degradation at 100,000 be virtually unnoticeable until you get to higher file counts?

P.S. I do know about the hard limit of 400,000 files on a drive. I've been told that if you try to exceed 400,000 files, you will end up very, very unhappy.

Thanks!

Many users has millions of files in their Google Drive, it will work.

Remember, Google themselves tried implementing a 5 million file limit per drive at one point (https://twitter.com/googledrive/status/1643049029251776515) to preserve stability and performance, with limited success - before deciding to solve the problem in a whole other way (ending unlimited storage).

Some operations, for instance moving files from a shared drive to the main google drive will grow slower as the number of files increases (see ra13's comment below).

Meaning, it can take hours for the files that was moved from the shared drive to be accessible in the main drive, or vice versa.

I guess users learn to implement some "cool down" / "staging" behaviours, just to prevent shooting themselves in the foot too badly when working blindly with operations like these.


https://www.reddit.com/r/google/comments/123fjx8/google_has_applied_a_5_million_items_limit_for/

it’s great for small businesses but any medium size business gets screwed out of basic features that are 100% necessary from a IT perspective. We have also seen instances of random file deletion, google support recovered the files but had 0 explanations as to why the logs were incomplete for that action.

Couldn't agree with this more.

I CONSISTENTLY face issues when moving files from "My Drive" to a Shared Drive. Eg. most recently, a set of 20 folders (25k files) was moved over, and it took HOURS before they all showed up in Shared Drive. And worst of all, there's no progress indicator or anything like that indicating the move isn't complete yet.

This happens EVERY TIME.

I've also faced situations where refreshing the folder link keeps showing different content in it each time. Got this on video too.

I'm not familiar with all the different types of Google Drive services. Our Google Drives are under the "Enterprise Plus" plan. Though I believe that we have a rider such that we have no limit whatsoever on the amount of disk storage we can use, rather than 5TB times the number of employees here. (As documented here.)

We do however definitely have a 400,000 file limit per shared drive. This has been verified by other people at my organization. Also, the 400,000 file limit is documented by Google here.

Re the problems you've been seeing, I believe you, but I haven't seen these problems myself. But I stick to using rclone, which may be resilient in the face of these weirdnesses? (I.e., I never use drag and drop to move large numbers of files.)

Ha ha, I know a lot of people will be very interested if you found a way :smiley: - please verify: https://admin.google.com/ac/storage

Yes, no way to raise or lower that 400.000 file limit for shared drives.

Using rclone to move a folder between a shared drive and the main drive makes no difference.

You will still see the delay, it simply comes down to the way the files are propagated in the google distributed storage.

Well, we're an extremely prestigious non-profit, so Google cuts us all sorts of special deals.

I don't have the permissions to look at the link you are pointing me at, but we use up incredible amounts of space on our shared drives, and the IT department tells us that there is no limit at the moment. (Though they don't promise that this arrangement will remain in perpetuity.)

Unfortunately, removing the 400,000 file limit per shared drive is not one of the perks we get, though.

I suppose another reason that I would never notice this is that I don't use my personal work drive for anything at all. Everything I do for work is put on a shared drive.

Excellent, to me it sounds like none of your usecases should cause any problems at all :slight_smile:

If a part of the 400.000 files need to be accessible, but never need to be written to, one could consider wrapping them into a zip/tar.gz file and putting something like tarfs on top, for easy access.

Another possibility would be to just make a big .iso file of them, as .iso files are pretty straight forward to mount in all os'es, these days.

NEVER use --fast-list with google drive
ALWAYS use --drive-chunk-size 128M
Make use to use your own API token not the public one.

I have never used mount, no experience with that, but copy and lsl and lsd and all that works very well on google drive with millions of files.

google throttles api's harder than bandwidth.

TLDR: Also never use google's own tools they stink. Just use rclone.

Well, I'm still not so sure. There are people who insist that there's performance degradation after 100,000 files on a shared drive, which, if true, is going to cause us to have to do a bunch of work, since we have shared drives with more than 100,000 files, which we'd then need to "fix".

But I can't find any actual evidence for this claim. For all I know, it's just urban legend.

As for using .zip files, etc., yes we've already taken to doing this in some situations in order to keep the file count down. But this involves changing software we've developed so that it can deal with this situation.

There's also a 750GB per day per service account limit for uploading data to a shared drive, and we've actually hit this limit at times. We've addressed this by using a throttling option on rclone that causes it to never upload more than that much data in a 24-hour period.

If you really want to know, I guess nothing beats a good old test :slight_smile: ... utilities like fsutil or dd / split can create the needed file set in a second.

Yeah, people routinely use 10-20 service accounts in parallel to be able to max out their upload bandwidth. Still works fine, and doesn't add any extra cost :slight_smile:

Yes, well we can test, but good benchmarking is a lot of work. Especially since we don't know if this putative 100k limit is a threshold, or rather the beginning of a gradual decline, etc. (Good benchmarking can be non-trivial even with specialized tools for the purpose for all sorts of reasons.)

So I've heard. We're fine with one service account per drive. At least for our current needs. The throttling works well enough, and if it takes a couple of days for the archiving to happen, it's not a huge deal. We have enough on-prem disk space to hold the files for a while.

True, and in this case the findings will also include the performance of the Google backend which can have good and bad days. But at least these fluctuations seems to have flattened somewhat out by now - the wait time feels more a lot more predictable when moving files between the main drive and a shared drive.

When you talk about a decline, what kind of operations / usecases are you imagining that would or could get slower?

I have no idea. I was told by a couple of people in our IT department to avoid shared drives with more than 100k files, since they claimed that performance degrades at that threshold. But that's all the information that they gave me, and they didn't cite any sources or point me at any evidence, or really say anything more than that.