Locally index files for easy/faster sync

Hi,

I'm having an issue syncing to Google Drive, as what I'm trying to sync up are 100,000s of files, and when it starts syncing, it's fairly quick, when it reaches a certain amount, then slows down, and keeps slowing down further and further until it's really unusable.

I did contact Google about it, and they did mention that they have some limits of how many requests you can send/receive to/from your google drive within a specific period of time (i don't know it but i think their support said 12-24hrs).

is there a way that rclone can start syncing to the google drive, and create a locally saved index of everything that's been uploaded to google drive, so the next time we run the sync command, it will check the locally saved index file, then start uploading whatever is required? that way we're using minimal requests sent/received to/from the google drive.

of course this is when syncing one way, local pc > google drive

your help/suggestion is greatly appreciated

It would be worth trying --fast-list on your sync. That does a much more efficient directory list (at the cost of more memory).

Rclone has the cache backend which is for that scenario. However I'd try --fast-list first.

As NCW says, the easiest fix is to try --fast-list first as it can be up to 15x faster on a big folder structure.

You might also try using --tpslimit 10 . If the problem was related to flooding the API then this will keep it under control and let it run at a consistent speed 24/7. (@ncw I suspect the drive-pacer may need some tweaking, but we can talk about that later).

That said, there is a way to do something along the lines of what you are asking for.

We could use VFS precaching to index your whole drive on startup - and then any further listings on that mounted drive (for as long as it stays mounted) will be close to instant, or mere seconds. Assuming that you don't need --checksum for your sync (as checksum attributes are not shown via mount) that will be amazingly fast at scanning though the whole drive and effectively only talk to the server when it actually decides it needs to transfer something - which is likely just a few percent of all the files. This VFScache will be kept up-to-date via polling (default 1m).

This requires a bit more setup and a script though. I suspect that if you use both of the first suggestions instead that will most likely be more than sufficient for you and much easier - so try those first :slight_smile:

thank you for the tip, i will surely test that out, it will take few days for me to retest this, as i'm currently recovering one of the computers which was causing this problem, so it will take a couple days to re-upload whatever is required then will test your suggestion. i will keep you posted how it goes.

thx again

Hi, I tried this and this is what I get when it scans through the files that have been already copied.

2019/11/18 01:00:01 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=123456789012, userRateLimitExceeded)

FYI, i'm doing rclone copy not sync.

Those DEBUGs are normal - if it gets to 10/10 it will throw an ERROR otherwise it retried successfully.

you're 100% correct, it didn't even get to 4/10 in the whole log

thank you @ncw you're the best :slightly_smiling_face:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.