Number of checks on Google Drive crypt is much higher than number of files in source directory


#1

I had a search around with both the forum search and Google but couldn’t find any similar topics.

I’m running rclone 1.45-DEV on Linux uploading from several local directories to a crypted Google Drive remote. I’m using “rclone sync” - pretty much all my data has been uploaded to Google Drive before now and so the sync operations seem to consist mostly of a few small uploads/changes and a lot of checks. I’ve noticed that the number of checks is much, much higher than the number of files in the local directories.

Here’s the full command I’m running:

rclone sync --transfers 50 --stats-log-level NOTICE --log-file=/home/gus/rclone-cron.log --delete-during -L /home/gus/ gdrive-crypt:gus

Here’s the latest stats output from rclone:

2019/01/16 10:32:21 NOTICE:
Transferred: 5.546G / 5.546 GBytes, 100%, 92.925 kBytes/s, ETA 0s
Errors: 71 (retrying may help)
Checks: 878402 / 878402, 100%
Transferred: 622 / 622, 100%
Elapsed time: 17h23m2.9s

Just for reference, this is a local rsync’s interpretation of the file count for this directory:
Number of files: 156,721 (reg: 149,060, dir: 7,513, link: 144, special: 4)
Total file size: 13,247,976,672 bytes

Based on previous experience, I think that for this particular directory the sync does eventually finish, but I have several other directories to sync with higher file counts than this and the number of checks is similarly much, much higher than the number of files. On one directory (with around 500,000 files) I left rclone running for over a week and it just never finished. When I finally killed the rclone process, the number of checks was up past 1.5 million and hadn’t changed in many hours.

I started adding --delete-during to my commands because I noticed that with the standard behaviour of “rclone sync”, files weren’t deleted from the remote unless the transfer was completely free of errors. This is something that seems very difficult to achieve with large file counts as invariably at least one file will change during the time rclone runs and so the run always finishes with errors. I presumed originally that the high check count was because there were many old files present on the remote that were being checked. After adding --delete-during, a huge number of old files were deleted during sync operations and I thought this would bring the count of both remote files and checks down, but this is seemingly not the case.

Any idea why the number of checks is so high and what I could do to reduce it? Similarly, are there any recommended settings for the number of checkers and transfers I should use for relatively high file counts with Google Drive?

I’d ideally like to run one full sync with rclone every week but at the moment, I can’t have even one full sync finish within that timeframe due to the massive check counts.


#2

I note you have -L so rclone will be following symlinks - That might have something to do with it.

Try rclone size /home/gus/ vs rclone size -L /home/gus/ to see the difference and what rclone things of the number of files and their size.

Drive supports --fast-list now, so if you have enough memory, using that in your sync will speed things up enormously.

Note that you can do a “top up” sync copying (let’s say) all files created in the last hour like this

rclone copy --max-age 1h --no-traverse /home/gus/ gdrive-crypt:gus

Note the --no-traverse is important to make it run quickly and you’ll need the latest beta for it to work.

Checks are accumulated over retries, so if you are doing multiple retries, then you’ll get multiple checks per file.


#3

If he’s got 50 transfers, that’s going to be a lot of retries.


#4

Yes, though these will be low-level-retries which don’t get counted as a check (confusingly!).


#5

Thanks for the reply - I’ll try these options out! So far, re-running the sync with --fast-list doesn’t seem to be doing much, but I guess it probably works by retrieving a fully recursive listing up front and then caching it in RAM to save time/lookups later? I certainly have a chunk of spare RAM I can donate to the cause.

My use of --transfers 50 was just because it seemed faster for lots of small files. I understand that Google Drive limits you to 2-3 files created per second anyway so it doesn’t always translate into better performance but back when I was originally uploading lots of small files, it seemed to work better than the defaults.

The top-up sync sounds like it could be very useful once I can get to running via cron on a fixed schedule.


#6

The concept of more is better is bad though.

You can create 2-3 files per second and only do 10 transactions per second. Having it at 50 just makes a ton of retries/errors and adds a great deal of overhead making it a lot slower.


#7

That’s interesting. I’ll try leaving the number of transfers at the default and see whether that helps.


#8

You should see 403 rate limiting in the logs if it’s too high. There is always a balance to find a sweet spot based on your setup/use case.


#9

I did see a lot of 403s when I was first uploading (using the default number of transfers) but I’m fairly sure that’s because I was hitting the overall 750GB/day limit. I haven’t really seen any since I’ve just been trying to sync a few changes up with what’s already there.

Interestingly the number of files without symlinks is ~150,000 and with symlinks it’s ~450,000 - I had no idea there were so many files symlinked into that source directory. Makes me realise I need to update my regular rsync to pick those up as well!


#10

Thanks a lot - this really seems to have been the magic flag that I needed. I made the modifications to my sync script and started rerunning it across all of my source directories yesterday evening - it finished successfully this morning. I should now be able to run this every couple of days and just have it upload all changes which is fantastic.

I also created a new client ID/secret of my own for rclone as I saw in another post that the shared one is at capacity.


#11

That is correct, so it will scan the entire remote first before doing any transfers.

It would be worth investigating what they are - you might just be backing up duplicates…

Great!

That will make things quicker too.