The local disk it's operating on is indeed an HDD. The files vary in size but are probably between a few hundred kilobytes for the smallest ones and 50 megabytes for the largest ones.
I'm generally not expecting files to change, only to be added or deleted, I don't suppose there's a command line switch to instruct rclone to only look for files that have been added or deleted?
I just read your post again, you are using rclone sync not rclone check so the bottleneck won't be md5sums, but it might be disk IO locally, but unlikely.
This is saying that it took 7 hours to list 437,754 files in your drive. That is a lot of files so it is listing 17 files per second which doesn't sound entirely unreasonable. You'll ultimately be limited by google's API rate limit - you can't do more than 10 API requests per second long term average.
What is the directory structure of these files? Is it lots and lots of directories with a few files in or a small number of directories with lots of files in?
It would be worth running with -vv also as you may seem some messages about rate limiting. I suspect they will look like this
2020/09/18 17:54:15 DEBUG : pacer: low level retry 3/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=XXX, userRateLimitExceeded)
2020/09/18 17:54:15 DEBUG : pacer: Rate limited, increasing sleep to 16.237384353s
If you have enough memory, using --fast list uses a lot less API calls. You'll need approx 1k of memory per object.
Rclone will appear to do nothing until the listing has finished. It shouldn't time out though.
Rclone will have to do pretty much the same amount of work... If you are syncing from local to network then you can use --max-age to only sync new items, but in this case using syncing from network to local it isn't going to help - rclone is still going to list everything.
Lots of directories, all with a few subdirectories each with a few files in.
I'll try --fast-list again with -vv.
EDIT: even after nearly 3 hours, and with -vv, rclone appears to have done nothing:
2020/09/18 20:48:16 INFO :
Transferred: 0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time: 2h48m1.1s
The only other information that even appeared so far was:
2020/09/18 20:33:23 DEBUG : drive: Loaded invalid token from config file - ignoring
2020/09/18 20:33:24 DEBUG : Keeping previous permissions for config file: -rw-rw-rw-
2020/09/18 20:33:24 DEBUG : drive: Saved new token in config file
and earlier
2020/09/18 18:02:02 DEBUG : Google drive root 'mirror/[redacted]': Disabling ListR to work around bug in drive as multi listing (50) returned no entries
2020/09/18 18:02:02 DEBUG : Google drive root 'mirror/[redacted]': Recycled 50 entries
Yes. Using -vv and --fast-list and --checkers 1 it was actually slower:
2020/09/19 04:23:44 INFO :
Transferred: 0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors: 2 (retrying may help)
Checks: 291836 / 291836, 100%
Elapsed time: 10h23m29.8s
Unfortunately overnight the output has overflowed the terminal buffer so I can't scroll back far enough to see if there were any pacer errors like you described earlier.
EDIT: tried running again, this time as rclone -vv --size-only sync drive:mirror/[redacted] . --drive-shared-with-me --checkers 10 and now I can see lots of messages like:
2020/09/19 06:12:14 DEBUG : pacer: low level retry 3/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=[redacted], userRateLimitExceeded)
2020/09/19 06:12:14 DEBUG : pacer: Rate limited, increasing sleep to 16.784295782s
2020/09/19 06:12:16 INFO :
Transferred: 0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks: 427 / 427, 100%
Elapsed time: 2m1.3s
2020/09/19 06:12:31 DEBUG : pacer: Reducing sleep to 0s
It was my understanding that because of the error Google drive root 'mirror/[redacted]': Disabling ListR to work around bug in drive as multi listing (50) returned no entries that --fast-list is not actually helping?
The --tpslimit 10 has certainly reduced the number of pacer errors though. I'll try playing with that number to see if I can eliminate them completely.
D:\Utilities\rclone\rclone version
rclone v1.53.1
- os/arch: windows/amd64
- go version: go1.15
I updated just the other day to see if that would help with my issue. My understanding from @ncw above is that it's not a bug on rclone's side, it's a bug on Google's side? So I wouldn't expect that moving to a beta version would help.
Well, I meant more that since I'm not the owner of the drive, I'd need to check with the owner before changing or deleting things, rather than actual logical file and folder permission. I'm not aware that there are empty folders, though there are lots of folders that only contain subfolders and no files.
This is probably what is causing rclone to think --fast-list isn't working.
If you are running sync, then you are already "allowed" to delete/move stuff. I think deleting empty directories, and running rclone dedupe would probably help you.
So then, should I be reporting this as a bug in the Github issues area? I don't think I should have to rearrange how files are stored just for rclone to be able to deal with them efficiently. The directory structure is the way it is for a reason.