In the past it always did a complete check. After the checking finished it would start transferring.
These days Rclone starts checking and as soon as it finds files it can transfer it will start transferring and performing the checking and transferring in parallel.
Is it possible to force it to finish checking before it starts transferring?
The reason for this is that Rclone seems to use only one full core of the CPU (Windows x64 version of Rclone). The work is spread out across the cores but the total usage is still only one core (12.5% for me)
This results in the transfer (using crypt) and the checking slowing down over time. Once checking finishes the transfer will speed up again. Crypt does not seem to ask much CPU time so the most of it goes to checking.
If there are other ways to optimize this please let me know.
I dont think you can tell it to wait to transfer till checking is complete. But what you can do is modify it so the checking either waits or does it all (memory permitting)
With --dry-run it takes around 2 minutes 13 seconds
Without --dry-run transfers start at some point and checking takes around 9-25 minutes (depending on how soon transfers were started) and getting slower and slower the longer it takes and affecting the transfer speeds as they drop far below 3 MB. Though the transfer speed drop only happens when only one transfer is active. If 4 transfers are busy, transfer speed does not drop
As you can see my bandwidth is limited to 3MB so disk access for transfers should be limited
CPU usage is only 0.5-1% for the Rclone process while transferring, not checking. If Rclone is still checking CPU usage for the Rclone process is (as earlier mentioned) 12.5%-13%
Disks are Hardware raid 5 using 4 * 8TB enterprise grade HDD's
22TB of used data spread over ~183000 files
I did some testing. The disks are 100% active in the beginning until the transfers start. While 3 transfers were active disk activity had dropped to 20% while CPU usage stayed the same at 12.5%-13% while checking. CPU usage dropped after checking was finished
Does Rclone slow down disk activity for checking while transferring?
Does --bwlimit affect the speed of disk reading?
with --dry-run red rectangle shows the runtime of the rclone process. Around 2 minutes 13 seconds in this case
without --dry-run red rectangle shows the runtime of the rclone process. Green rectangle shows where Disk active time was 100% according to Windows task manager
First rclone does checking until it fills up the transfer queue. At this point the checking will slow down to the speed of the transfer as it will need one file to be transferred so it can add another to the checking queue.
You can see this in the --progress display - the checking will max out at 10,000 if this theory is correct.
So if you add --max-backlog 200000 rclone will do the checking as quickly as it can. It will still start transfers as soon as it can but the speed of the transfers won't delay the checking.
Rclone doesn't. your OS may though depending on IO scheduling. Rclone is doing more IO when it is transferring AND checking.
Yes it does - the disk will be read at that speed on average.
I did have a request for rclone to do the checking phase before the transfer phase - it would be relatively easy to do...
Does this mean that it will slow down at 4 active transfers or when 10000 (backlog) transfers are scheduled?
In my example the total scheduled for the full task was 184005 Checks, 7 transferred, 6 deleted
The weird thing is that while Rclone slows down at some point, as you said. In my example the CPU usage was not affected. You would assume that slower checking would mean lower CPU usage
Of course, it reads only as much as it can send but does it actually limit it or should it be able to read faster (above bwlimit) if that was necessary?
Transfers take very little CPU time. After Checking has finished then CPU usage drops to 0.5-1%
What do these numbers mean?
The left number is google drives check and the right one the local drive check? It's hammering against the 10000 backlog limit and the HDD's are barely doing anything.
Tried adding --max-backlog 200000. It made the right number shoot up like a rocket. The left stayed the same.
Removed --bwlimit. made no change in the speed of the left number. The moment the transfers start checking starts slowing down
Left number goes really fast (2 seconds) from 0 to ~35000. Then it encounters 2 transfers (out of 4 available transfer slots)
The transfers start and the left checking number increases at ~2000 p/s. But getting slower and slower.
After a minute it only increases by ~200 p/s
After a couple minutes of transferring the transfers start getting slower.
After 5 minutes it only increases by ~100 p/s
After 8 minutes it only increases by ~80 p/s and transfers are at 2/3 of available bandwidth (still only 2 transfers)
After 14 minutes it only increases by ~70 p/s and transfers are at 1/2 of available bandwidth (still only 2 transfers)
The CPU usage stays at 12.5% (one total core) all through this. Disk usage never went higher then 10%
sidenote: I doubled all the left checking numbers from what i saw in --progress. As --progress shows at half second intervals
How long the checking takes really depends on when it encounters anything to transfer. Is it quite soon like the above or more to the end which were the screenshots in earlier posts.
Well i guess you get the point with this behavior.
You can slow all this down with the tps-limit flag. That's the number of transactions per second.
This means the number of checks complete over the number of checks it knows needs to happen. It will attempt to keep the number you've specified in backlog based on the number you've specified (10000 be default). The Right number will increase over time and the left as it actually checks them.
For your issue, I think you'd likely benefit from tps-limit and a lower backlog.
Yeah i know that much. Thats logical. Rclone needs to find out how many checks it needs todo and how many its done. But what is its basis. Is the right number based on files it found locally and the left number how many files it has checked on GDrive?.
The idea is that if i run into the backlog limit, the left number is not increasing fast enough to keep up with the right. That would mean that Rclone and GDrive are not communicating fast enough. Which would mean increasing tps-limit and various other pacer limits might be necessary. On the other hand because CPU-usage stays the same while performance overal drops, it might just be the CPU equivalent of a memory leak... if that makes any sense
Disk utilization is really low (below 10%) according to Task manager.
Besides thanks to running it so often most of the data is cached anyways so disk access should be almost non-existent.
Finally the latest changes were synced. Now i could finally test "dry-run" against "non-dry-run without changes". So i have three situations
--dry-run: 2-3 minutes
Normal run where transfers are encountered: depending on how soon the transfers are found checking can take anywhere from 9 to 25 minutes
Normal run where no transfers are found: 2-3 minutes
So there is no difference between the dry-run and a normal run where nothing needs to be transferred.
But now the kicker. Tested your new version. It fixes all issues. I don't even have to use the --check-first flag. So copied it to my Rclone production directory and my issue was back.
So WTF... So i went on a testing binge
Cleared Compatibility assistant. That automatic thing can really $%^$% you over
Checked priorities of the process and see if that made a difference. Windows Task Scheduler runs tasks at Below Normal priority while running a batch file manually runs it at normal priority
Neither fixed the difference between the way production and test work.
So checked the differences between the test environment and production environment.
They were testing the same local production data to the same production destination
The versions were now the same so no difference either.
Rclone configuration file used is also the same
The OS and hardware are the same. Well its the same system not 2 different systems so yeah this is obvious
Compatibility assistant was cleared so no possible side effects there.
The only difference was the loglevel parameter and the filename where the log was written to. DEBUG for test environment and INFO for production environment
DEBUG log level works perfectly and fast on 1.51 and 1.51.0-326-g791f879a-fix-check-first-beta.
INFO log level has the issue we are discussing here
NOTICE shares INFO's issue
So my question is. Why are INFO and NOTICE log levels not working at full speed while DEBUG is?
I would have thought DEBUG would have affected performance more then NOTICE and INFO. So really surprised about this result.
I also tested the --check-first flag. It fixes the problem as well
The debug flag affects the logging and the logging only. So if you set the logging flag then rclone will do more log calculating the logs and more IO writing them out to the log file.
Does that work with DEBUG and without debug? If so then I'll merge the fix!
So did a quick test with DEBUG and
--check-first does not work as intended with DEBUG. Transfers start before checking is finished
It does work as intended with INFO. Transfers do not start unless checking is fully finished
This is based on what --progress shows me.
I'm surprised here as well since if log level only affects the logging. Why does it affect --check-first then?
Both DEBUG and INFO showed in the logs
2020/05/16 18:55:30 INFO : Encrypted drive 'cryptGDrive:D': Running all checks before starting transfers
I can see I made a mistake with the --check-first flag - it was starting the transfers as soon as the file system traversal was complete, not when the checks were complete.