How do I solve this issue? google just does not have enough api's to spare me

rclone 1.55

rclone cryptcheck --one-way -vv --fast-list --drive-chunk-size 128M "D:\rtmpdump newfolder 8-29-2017" "cleancrypt:D:\rtmpdump newfolder 8-29-2017" --checkers 2

First time I ran this command today I got an error I cannot quote exactly but it said that I had made too many api calls and that rclone was going to stop trying and return no results at all.

Second time I ran this command today it worked fine. All checks were made. It was VERY slow to start though and you'll see why if you open this text file.

exampleofproblemm.txt (170.8 KB)

I cut the file off when it actually started making the checks successfully. But the rest of the file would've been just the tool working pretty much fine (still some api throttling but not enough to break the run)....

Here are the final results of the above file:

2021/05/20 13:39:43 NOTICE: Encrypted drive 'cleancrypt:D:/rtmpdump newfolder 8-29-2017': 0 differences found
2021/05/20 13:39:43 NOTICE: Encrypted drive 'cleancrypt:D:/rtmpdump newfolder 8-29-2017': 62 matching files
2021/05/20 13:39:43 INFO :
Transferred: 0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks: 62 / 62, 100%
Elapsed time: 53m8.7s

2021/05/20 13:39:43 DEBUG : 5 go routines active

So how can I cut out this time? I don't think --no-traverse is functional on a cryptcheck is it? The problem here, as far as I can tell, is the ls command rclone must make before it begins it's task? right?

A month or two ago this never resulted in total failure, and honestly speed isn't that big an issue to me. The total failure is. Maybe rclone could be patched so that instead of giving total failure and closure of the tool after 50or so minutes the pacer just keeps waiting? This folder has grown a LOT since 2017. That's why this takes so long, I can understand that. It's gotten slower and slower month by month and year by year. The real issue is that's it's gotten so big that rclone is giving up rather than continuing to pause (sometimes.... but pretty soon probably always). In case you need/want details on what I call "large".

rclone size of the folder lists:

Total objects: 200785
Total size: 67.573 TBytes

so uh. A) increase the time limit before rclone gives up or B) tell me what I'm doing wrong and magically this wasted 30-40minutes a day is saved and goes away entirely :slight_smile:

Are you running your own client ID/secret?

I am not. I guess I should be. I guess I never figured out how to obtain it.

Hmm, I already have the google api setup on my google account... I wonder why the secret is blank in my config.... is it just hiding my secret for me?

edit: Okay now I am using my own client ID and secret... I cannot comprehend why I wasn't before, it was literally already configured on my google account and it was even named "forrclone". Apparently I set it up in 2019, then just didn't ever use it?!? that seems odd. Maybe in 2020 I restored my rclone config from an out of date backup when I got a new computer in 2020, and the only changes to my rclone config between 2017 and 2021 was adding my own client id in 2019....

Blank would mean you aren't using it.

You should see something filled out for client_id and client_secret.

Thing is, we have to wait until tomorrow to see if this helped :slight_smile: Because I'm not sure what "test" would be fair other than my daily backup.

I'd make the changes and you can test with anything.

A few rclone ls commands or whatnot and validate that you are seeing hits on it on your API console.

The default API key is rclone's and it's a bit over subscribed so you see more issues in general so it's best to use your own.

I have also noted that GoogleDrive is getting slower these days – especially the folder traversal. I have found that my checks runs faster without --fast-list, as this allows better concurrency and spreads the directory lookups over longer time (i.e. less throttling). More info here. Please note that I am doing full downloads, so things may be different for you.

You may also be able to slice your data, to do the check in smaller chunks - thereby avoiding full retries in case something times out. Here is an illustration of the pattern I use for my (paranoid) checks:

rclone check Orig: Crypt: --download                      --min-age 2019-01-01
rclone check Orig: Crypt: --download --max-age 2019-01-01 --min-age-2020-01-01
rclone check Orig: Crypt: --download --max-age 2020-01-01 --min-age-2021-01-01
rclone check Orig: Crypt: --download --max-age 2021-01-01

Oh yeah, that would test if I set rclone up correctly, but it wouldn't test how well making 20,000 checks or whatever works. So, like I said, I'll know tomorrow.

I'm having trouble comprehending how fast-list makes things slower... or what would be a good test to prove your theory. If you can help suggest a scenario I can run that will prove beyond a shadow of a doubt if --fast-list is better than not fast-list or vica versa... well I'd like to know...

I could run an rclone lsl "cleancachecrypt:" and measure if it takes 1 day or 2 days :-p but that's very boring.

The directory with 20,000 files and 60TB of data is already very much reduced.... the rest of the remote is like .... uh.... several million files and only 20TB of data more. Because I sort of tried to solve this problem actually by randomly sometimes when I felt like it manually doing things like taking a program I want to backup and zipping it before putting it on the cloud so it's one file instead of 10,000.

edit: Actually looking at your example commands, I don't comprehend how they function at all... the check command doesn't work on googledrive encrypted remotes that are stored as crypt remotes in rclone.... right? that's why cryptcheck was invented? I thought? Right? (Maybe you just simplified your example to make it more readable though?)

That would validate you have the API key setup and it's all working properly before you try something larger.

I managed to repeat an identical scenario, like any good scientist I changed two variables at the same time for no reason. I'm using the new personal clientid AND I removed --fast-list.... It went at warp speed. I saved 40or50 minutes.

Problem solved, in either one way or both ways.

I carefully read the documentation and noted that --fast-list was designed for bucket based remotes (S3,B2,…) and didn’t have GoogleDrive on the list. I also noted that my scenario could fall into the category where --fast-list was discouraged.

I then collected some statistics on my execution times with and without --fast-list, and they showed that --fast-list slowed my sync by 0-100%. I observed that the longer execution times were due to transfers starting later and continuing after the directory traversal was complete. Based on this I guessed that the better performance without --fast-list was due to better concurrency/parallelization making the sync less prone to delays caused by Google throttling the directory traversal.

Also, if --fast-list was indeed the fastest option in most scenarios (for some backends), then it would have been the default (for those backends) and we would have had an exception flag like --no-listR instead. I therefore think the current name gives a false impression, something more neutral like --list-recursive would have been better imho. Easily said in hinsight.

My commands do a full byte-by-byte comparison due to the --download flag and that is possible for crypts too. I understand that it will take quite some time with 20-60TB of data and that is why I slice my data and do it over some days/weeks – taking a chunk a day.

I guess you can use the same slicing technique with cryptcheck too, but it is probably irrelevant now.

Great, you may get even better speed by increasing --checkers, now that --fast-list isn't limiting concurrency anymore.

No

cryptcheck is quite expensive because it has to read the start of every file off the cloud storage.

You have control over this with --low-level-retries

:slight_smile:

I see you've set --checkers 2 - that is a good start. You could try 1. You could try setting a --tpslimit 1 say to do one transaction per second to see if that helps. (Adjust that number accordingly).

You could also do rclone check and just occasionally do rclone cryptcheck.

You could use rclone md5sum to make a list of md5sums on the remote and check to see if they change and the same with the local disk. That is effectively what any rclone cryptcheck will be doing after the first check.

To clear be, using my own api and taking fast list off has combined to make a command that took like 50minutes now take ... like I dunno 5 SECONDS.

The insane thing is in 2019 I generated my own api key, but in 2021 I wasn't using it. Which just blows my mind. No idea how that happened.

It's definitely like magic, slash sorry for hammering the public api key so hard for the past however long!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.