I am using the latest rclone version (linux/amd64) and I want to sync a directory on my NAS to GDrive which consists of over 2.5 million files and was wondering what the fastest way to check for changes to the files through all of them.
one thing to think about is to use or not use https://rclone.org/docs/#fast-list
as gdrive has lots of throttling, making a lot of api calls will be slow.
but with so many files, you might run out of memory.
Remove --tpslimit=10 (unless a test has shown that it is needed, if so then ignore the other ideas too)
Add --drive-pacer-min-sleep=10ms (to use the latest limits from Google)
Change --checkers=16 (to increase concurrency)
I don’t know how they will play with all the other parameters. I would perform some tests to find the 2-3 parameters that makes the most difference and then leave the rest at defaults (remember --dry-run).
I have assumed default settings for Google Drive in your config.
PS: It may also be faster to execute rclone directly on the NAS (if possible). It depends on your LAN speed, the specs of your NAS and your current client, network protocols etc.
Rclone check the latest modified time of the file right ?
So if an old file which was like lets say 2 month old was modified the max age would pick it up as a file which is not older than x days ?
also is there something better to use than --size-only for my command?
i do not use gdrive and it has many quirks, so no idea how to optimize for it, perhaps @Ole knows...
is this a one-time sync or to be run on a schedule?
not sure your use-case, how critical the data is and if this sync is the only backup.
yes, --max-age uses mod-time.
you can run a daily sync using --max-age=24h and once a week without -max-age
this can work great if new files are added to the source.
tho rclone might not notice some files until a full sync is performed.
Best bet is to start with some big numbers, check the logs and go from there. You want to push the API hard, but not create too many pacer issues. It's a balance.
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 1.246962361s
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 2.951614579s
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 4.732838921s
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 8.396977648s
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 16.613072419s
2021/08/15 14:55:55 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:55 DEBUG : pacer: Rate limited, increasing sleep to 16.694013281s
2021/08/15 14:55:56 DEBUG : pacer: low level retry 2/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=5061111557159, userRateLimitExceeded)
2021/08/15 14:55:56 DEBUG : pacer: Rate limited, increasing sleep to 16.550344014s
This is the limits google shows:
Queries per day - 1.000.000.000
Queries per 100 seconds per user - 20.000
Queries per 100 seconds - 20.000
Just a thought out of left field, would it be better to run rclone on the NAS? I know when I am using rclone on an SMB mount on my mac, it is just miserable. I don't think SMB is as fast as rclone. You would need to be more careful of memory but if SMB is the bottleneck, it could help.
this is the command, just run it again now, took 26 seconds. rclone sync D:\files\source wasabi01:rclonelotsoffiles --size-only --transfers=128 --checkers=128 --dry-run --progress --stats-one-line --log-level=INFO --log-file=log.fast.nolist.txt
as for --size-only, on s3, does not require extra api calls, so that does speed things up.
as for --checksum, as per the docs. https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time
"If the source and destination are both S3 this is the recommended flag to use for maximum efficiency."