Understanding Pacer behavior with Google Drive

asdffdsa · March 18, 2022, 4:42pm

rclone plays it safe, if there are errors in the sync, then do not delete files.

imho, fwiw, at this point, take a few minutes, to scan the commands and flags, so you can know what is possible and perhaps answer some of your own questions.

might try --ignore-errors or tweak https://rclone.org/docs/#delete-before-during-after

ncw · March 20, 2022, 1:32pm

If rclone can't read a file or directory in the source then it would delete them on the destination without this check thus leading to data loss.

Rsync does the same thing (and that's where I stole the idea!)

phatmandrake · March 21, 2022, 3:28pm

Transfer happens way faster now so I can feasibly test on my sample.

Despite the errors, files that have been updated/added do change on the remote. So I believe everything is fine for me now. As long as the error doesn't just stop any transfers from happening (which I was having trouble understanding), it works.

Thanks for taking the time to help all.

rclone sync "PATH" REMOTE: --checkers=16 --drive-pacer-min-sleep=10ms --retries=1

Is perfect for followup synchronization to a read only share on Google Drive for me. I run into no rate limit errors currently.

My comment is that it would be nice if the type of error rclone detects could be accessed, being able to tell rclone to ignore that particular error type, when determining to retry, would be really valuable!

ncw · March 21, 2022, 3:55pm

Rclone itself does its best to classify errors into errors it needs to retry and errors which are fatal. This isn't a perfect system though, especially with Google Drive where the same error code can mean both sorts of thing!

In general it is a hard problem classifying errors. It would be possible to give the user some control over that, but probably only with the error strings themselves which would be not 100% accurate.

Rclone never lets errors pass silently, it either retries them or passes them to the user.

In the particular case of the local backend, I think you are after a "ignore errors while listing" flag or something like that?

phatmandrake · March 21, 2022, 4:30pm

My understanding is that RClone attempted a full synchronization (with default settings) 2 additional times because of a permission error. A Permission error would never resolve itself without intervention, so RClone retries in vain. The permission issue was on the Source Windows Server that I am sychornizing from.

For me it works to simply say --retries = 1 because I have a tolerance for files not existing/updating on the Remote I'm synchronizing too. Which I think would be similar to simply saying "ignore all errors while listing" it could be a helpful alias.

I'm not sure what other errors trigger a full sync, but in lieu being able to account for every variation of possible error. A retry if more than X errors are found might be useful? I don't know I'm still a little iffy on why RClone, which is aware of which files/dirs that error. Doesn't simply retry to sync (re list ?) the specific file/folder branch it's having issues with? Sorry if there's an obvious reason I'm missing.

ncw · March 22, 2022, 4:42pm

Rclone will always do a full retry if there were errors in the sync it couldn't correct with a low level retry. It doesn't target the particular directory it had a problem with it just does the whole sync again.

It sounds like --retries=1 is a good solution for you. Its one I've used in the past to back up fast changing file systems, where you get a "best effort" backup which may not be perfect but could never be perfect without using a snapshot of the filing system.

phatmandrake · March 22, 2022, 11:49pm

Is it possible to get just the reported errors as JSON Objects? Just looking for an easy way to parse errors.

ncw · March 23, 2022, 12:29pm

Try --use-json-log

phatmandrake · March 23, 2022, 6:34pm

That'll have to do thanks!

Also for posterity, as a Google Workspace Enterprise Customer I am using the following for mirroring a Windows File Server Directory, where I have a tolerance for unsynchronized items.

rclone sync "PATH" REMOTE: --checkers=120 --drive-pacer-min-sleep=0ms --retries=1

Down to 6-9 minutes for an average Sync from 5.5 hours with out of box settings.

Example:

Transferred:      118.920 MiB / 118.920 MiB, 100%, 950 B/s, ETA 0s
Checks:            241423 / 241423, 100%
Deleted:                3 (files), 0 (dirs)
Transferred:          163 / 163, 100%
Elapsed time:       9m9.4s

Nearly 200 read calls per second according to google, with an error rate of 1-4.15% per GCP metrics.

Documentation suggest harder limitations are for write calls, the documentation stated no more than 3 write calls per second were allowed. So I don't know if this would be appropriate for an initial sync, but I am told for large scale migrations you can ask for a rate increase.

This was fun to fiddle with, the biggest help probably came from increasing the checkers to 32 initially, from there signficant diminishing returns begin. Basically doubling the checkers resulted in a further 50% increase in performance until about 120 where I start to hit my rate limit of 20,000 per 100 seconds.

It was also better to just remove the minimum pacer in my circumstance since it takes so much to hit my rate limit. Probably not smart for someone who would hit it frequently and slow down from frequent error responses.

Interestingly the Rate Limits seem to be per project and not per account, so I'm wondering if I can get creative and run dual rclone's associated with different projects to synchronize faster.