Rclone copy hangs after a couple of hours overnight

Separate from the issue where I'm get a large number of ERROR : ... Failed to copy: file already closed, I also have been having to restart rclone every morning after checking in on it. It always gets stuck overnight at some point forcing me to restart in the mornings. The command that I use is:

rclone copy . GDriveCrypt: --bwlimit 8650k --progress --fast-list --verbose --transfers=10 --checkers=20 --tpslimit=10 --tpslimit-burst=10 --drive-chunk-size=256M --log-file=./log.log

In the mornings I see no network activity, 0% CPU usage for rclone (although it is around 40% when it is doing active transfers), although the transfer rate shows as a non-zero number. I have to do a keyboard interrupt:

Transferred:      472.334G / 1.461 TBytes, 32%, 8.341 MBytes/s, ETA 1d10h55m25s
Errors:                92 (retrying may help)
Checks:            449886 / 449886, 100%
Transferred:        12763 / 22791, 56%
Elapsed time:  16h6m29.7s
Transferring:
 * <file>.imgc:  5% /912.067G, 2.231M/s, 109h29m4s
 * <file>.JPG: transferring
 * <file>.mkv: 58% /13.054G, 2.269M/s, 40m25s
 * <file>.mp4: 83% /1.640G, 2.288M/s, 2m4s
 * <file>.JPG:  0% /754.299k, 0/s, -
 * <file>.JPG:  0% /1.146M, 0/s, -
 * <file>.jar:  0% /985.255k, 0/s, -
 * <file>.png:  0% /133, 0/s, -
 * <file>.JPG:  0% /2.228M, 0/s, -^C

The last item in my log file from yesterday is at 2:24:36 AM with a status of INFO : ... : Copied (new), and the one the day before is at 7:54:44 AM with the same status.

I'm not sure what to do here. I'm happy to post the logs from yesterday and the day before if someone can guide me on how to censor the personal information in them. (I'm happy to write a quick parser to do this if no one has a better solution and the logs are necessary to proceed.)

You can only copy up 750GB per day.

If you run the command with -vv, you should see errors with it.

I can start tonight's run with -vv. However, I'm doing nowhere near 750GB per day. As you can see my 8650kB/s rate limit shouldn't let it transfer more than 712GB (8650*2^10*3600*24/(2^30)) per day.

If you can share the logs with -vv, that should have the relevant information.

The same thing happened. Here's a censored version of the end of the log (nothing interesting in the log before these lines):

2019/09/05 14:58:34 DEBUG : 096.jpg: Unchanged skipping
2019/09/05 14:58:34 DEBUG : 097.jpg: Unchanged skipping
2019/09/05 14:58:34 DEBUG : 098.jpg: Unchanged skipping
2019/09/05 14:58:34 DEBUG : 099.jpg: Unchanged skipping
2019/09/05 14:58:34 DEBUG : 114.jpg: Size and modification time the same (differ by 0s, within tolerance 1ms)
2019/09/05 14:58:34 DEBUG : 114.jpg: Unchanged skipping
2019/09/05 14:58:34 DEBUG : 122.jpg: Size and modification time the same (differ by 0s, within tolerance 1ms)
2019/09/05 14:58:34 DEBUG : 122.jpg: Unchanged skipping
2019/09/05 14:58:38 INFO  : file.class: Copied (new)
2019/09/05 14:58:38 INFO  : file.sol: Copied (new)
2019/09/05 14:58:38 ERROR : file.png: Failed to copy: file already closed
2019/09/05 14:58:38 DEBUG : pacer: low level retry 1/1 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas, userRateLimitExceeded)
2019/09/05 14:58:38 DEBUG : pacer: Rate limited, increasing sleep to 1.566995453s
2019/09/05 14:58:38 DEBUG : file.class: Received error: googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas, userRateLimitExceeded - low level retry 1/10
2019/09/05 14:58:38 DEBUG : pacer: Reducing sleep to 0s
2019/09/05 14:58:38 DEBUG : pacer: low level retry 1/1 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas, userRateLimitExceeded)
2019/09/05 14:58:38 DEBUG : pacer: Rate limited, increasing sleep to 1.079814704s
2019/09/05 14:58:38 DEBUG : file.class: Received error: googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas, userRateLimitExceeded - low level retry 1/10
2019/09/05 14:58:38 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas, userRateLimitExceeded)
2019/09/05 14:58:38 DEBUG : pacer: Rate limited, increasing sleep to 2.566992179s
2019/09/05 14:58:39 DEBUG : pacer: Reducing sleep to 0s
2019/09/05 14:58:39 INFO  : file.jpg: Copied (new)
2019/09/05 14:58:39 INFO  : file.m4a: Copied (new)

I just keyboard-interrupted the command again as it was stuck on this for an hour:

Transferred:       42.207G / 125.084 GBytes, 34%, 5.845 MBytes/s, ETA 4h1m58s
Errors:               244 (retrying may help)
Checks:            528713 / 528713, 100%
Transferred:        20467 / 30497, 67%
Elapsed time:   2h3m13.8s
Transferring:
 * <file>:  0% /1.661k, 0/s, -
 * <file>: transferring
 * <file>:  0% /1.247k, 0/s, -
 * <file>:  0% /984, 0/s, -
 * <file>:  0% /2.345k, 0/s, -
 * <file>:  0% /1.928M, 0/s, -
 * <file>:184% /57, 0/s, -
 * <file>: transferring
 * <file>: transferring^C

I've also never seen transferring like in three of these files. The log is over 200 MB in size since I ran it with -vv.

Is the fact that I'm creating a log in the same folder that I'm using rclone to copy problematic? That is, rclone copy is copying ./ and the log file is being saved to ./.

If the log file was in the same folder as you were copying, it would just error out as the file is open and move forward.

Unfortunately, that snippet of log doesn't have any information to help debug it as the only thing there is a couple 403 rate limits, which based on the 10 transfers/ 20 checkers, I'd expect those as you are going over the quota.

Can you post the whole log somewhere as that would be the most help.

The same thing has happened again since I last posted the log snippet. Now I have two logs from this afternoon. One is ~192MB and the other is ~195MB.

I'll post the two logs shortly.

The 10 transfers and 20 checkers run concurrently? Doesn't my --tpslimit=10 and --tpslimit-burst=10 protect from 30 concurrent requests?

(FYI -- I'm using my own tokens with Google Drive, not the rclone ones.)

Those are the rate limits errors I was referring to.

I'm not sure at a code level exactly how that would reduce the checkers and transfers as it does seem very counter intuitive to set it high and rate limit it.

If it was working as you expected, the 403s are probably from trying to create too many files at once.

I'd just make the number more inline with what I wanted rather than trying to limit it.

a -vv log of the part where it actually stalls would be the most crucial information (or just a full log). If you zip it, it should much more manageable in size as it will compress really well. The log you have shown so far doesn't seem to have stalled yet. Some 403 errors is normal as long as it's just intermittent. 403's can be either quota or intermittent API limiting, but if it were quota then you'd have large sections with basically only 403's and very high retry counts (10 being the default max before it just fails)

On Gdrive you have the drive-pacer which will handle your request bursting for you, so at least in theory it should to be necessary to manually limit it yourself - at least not unless something else is significantly loading the authorization at the same time. That said, your transfers and checkers are too high to expect Gdrive to handle anyway (this varies depending on the cloud service). 6-8 transfers and 8 checkers should be more than enough and it will probably be a little faster if anything as you won't be bumping into rate limits as much.

You don't need to worry about backing up from an active drive. Rclone just skips any files that are currently open so it won't cause any sort of recursion problem or fatal error.

See if you can get those -vv logs of the time when it is actually stalled and that should tell us a lot hopefully.

Just FYI -- I've always been under the 1000 queries per 100 seconds per user limit as can be seen from the chart.

This is with the 10 transfers and 20 checkers. I have now reduced these to 5 each, but might go try 6-8 transfers and 8 checkers as recommended by @thestigma.

I will post a link to the logs here soon.

Yea, it's unfortunately a bit more complicated than just that and there are more limitations and quotas working together than might be apparent.

Like on Gdrive you also have a burst limit of file operations pr second (in this case how many files you can start uploading pr second). About 2-3/sec. I'm not 100% if that spesific limit gives you a 403 in return, but that's likely. With 10 transfers and small files that's probably going to happen a fair amount. It really depends on what your files look like. On huge files 30 transfers probably wouldn't be a problem as an example as they would just go concurrent after starting to upload (but also there would be little benefit in bandwith utilization to do that).

More transfers is usually most helpful exactly to move small files a little more efficiently - but how much you can push this before it becomes counter-productive is very cloud-dependent.

Side-note: some sort of system imaging software might fill your requirements a lot better than doing file-by-file syncs of the entire system. Worth considering.

But anyway - this is unlikely to be the root of your problem. Being throttled should not be resulting in a stall even if you used way to optimistic transfer values - so let's try not to veer away from the main problem at hand for now and get side-tracked.

It more depends on what your use case is and what are you trying to accomplish.

For me, I copy new files, large size up each night so I can run a lot of transfers and I need virtually no checkers.

If your use case is copying/sync small files with little change, you can take into account you can create roughly 2-3 files per second without getting rate limited down so you can use less transfers and if checks are most of your thing, make more checkers.

The default 4/8 is pretty good in that regard and for my setup, I leave as is and use a max transfer:

/usr/bin/rclone move /local/ gcrypt: --log-file /opt/rclone/logs/upload.log -v --exclude-from /opt/rclone/scripts/excludes --delete-empty-src-dirs --user-agent animosityapp --fast-list --max-transfer 700G

I think I've fixed that problem already in: https://github.com/rclone/rclone/issues/3505

Assuming it is the same problem it was caused by a deadlock when using the --progress flag.

Try the latest beta in 30 minutes and if that doesn't fix the please kill the process kill -QUIT xxx and post the backtrace (it will be big) into a new issue.

1 Like
$ rclone version
rclone v1.49.1
- os/arch: linux/amd64
- go version: go1.12.9

I'm getting 413 Request Too Large nginx errors from pastebin for both my log files. Any ideas on other ways to share?

Unfortunately my whole task finished (after tons of retries over the last week) so I can't really test this again. I'll have some more data to copy in a week or so, so maybe I can test then.

On a completely separate note, I was able to transfer over 912GB (one file) to Google Drive in under 8.5 hours without GDrive cutting me off. I thought their limit was 750GB/day?

Transferred:      912.289G / 912.289 GBytes, 100%, 31.178 MBytes/s, ETA 0s
Errors:                 0
Checks:                 1 / 1, 100%
Transferred:            1 / 1, 100%
Elapsed time:  8h19m23.3s

More than 750 in a file is documented here.

https://support.google.com/a/answer/172541?hl=en

Any file that starts uploading will finish also and surpass the limit. In-progress transfers are not cut - although I have heard some claim they may get severely throttled after passing 750GB. I've never tried uploading a single file that large. it seems quite impractical.

If you zip them you can email them to me nick@craig-wood.com (put a link to this forum page in please!)

Got it on the 750GB limit overage.

@ncw I've emailed you the logs.