Rclone sync doing transfer and checking in paralel

Hi

I'm using Rclone (1.51.0) sync.
C:\rclone\rclone.exe sync --bwlimit 3M --fast-list --delete-after --delete-excluded --exclude-from "C:/rclone/Configs/Exclusions.txt" --progress --log-file "C:\Users\name\Desktop\MirrorD Log\Rclone Mirror_D.log" --log-level INFO local:D:/ cryptGDrive:D

In the past it always did a complete check. After the checking finished it would start transferring.
These days Rclone starts checking and as soon as it finds files it can transfer it will start transferring and performing the checking and transferring in parallel.

Is it possible to force it to finish checking before it starts transferring?

The reason for this is that Rclone seems to use only one full core of the CPU (Windows x64 version of Rclone). The work is spread out across the cores but the total usage is still only one core (12.5% for me)
This results in the transfer (using crypt) and the checking slowing down over time. Once checking finishes the transfer will speed up again. Crypt does not seem to ask much CPU time so the most of it goes to checking.
If there are other ways to optimize this please let me know.

hi,
these two flags could help you

https://rclone.org/docs/#transfers-n
"The number of file transfers to run in parallel"

https://rclone.org/docs/#checkers-n
"The number of checkers to run in parallel"

I dont think you can tell it to wait to transfer till checking is complete. But what you can do is modify it so the checking either waits or does it all (memory permitting)

https://rclone.org/docs/#max-backlog-n

This might be IO limited - checking can be quite intensive if you have a lot of files. Do you have an HDD rather than an SSD?

Does the windows task manager show it waiting on IO?

With --dry-run it takes around 2 minutes 13 seconds
Without --dry-run transfers start at some point and checking takes around 9-25 minutes (depending on how soon transfers were started) and getting slower and slower the longer it takes and affecting the transfer speeds as they drop far below 3 MB. Though the transfer speed drop only happens when only one transfer is active. If 4 transfers are busy, transfer speed does not drop

As you can see my bandwidth is limited to 3MB so disk access for transfers should be limited
CPU usage is only 0.5-1% for the Rclone process while transferring, not checking. If Rclone is still checking CPU usage for the Rclone process is (as earlier mentioned) 12.5%-13%

Disks are Hardware raid 5 using 4 * 8TB enterprise grade HDD's
22TB of used data spread over ~183000 files

I did some testing. The disks are 100% active in the beginning until the transfers start. While 3 transfers were active disk activity had dropped to 20% while CPU usage stayed the same at 12.5%-13% while checking. CPU usage dropped after checking was finished

Does Rclone slow down disk activity for checking while transferring?
Does --bwlimit affect the speed of disk reading?

with --dry-run red rectangle shows the runtime of the rclone process. Around 2 minutes 13 seconds in this case

without --dry-run red rectangle shows the runtime of the rclone process. Green rectangle shows where Disk active time was 100% according to Windows task manager

Useful info thank you!

Here is what I think is happening.

First rclone does checking until it fills up the transfer queue. At this point the checking will slow down to the speed of the transfer as it will need one file to be transferred so it can add another to the checking queue.

You can see this in the --progress display - the checking will max out at 10,000 if this theory is correct.

So if you add --max-backlog 200000 rclone will do the checking as quickly as it can. It will still start transfers as soon as it can but the speed of the transfers won't delay the checking.

Rclone doesn't. your OS may though depending on IO scheduling. Rclone is doing more IO when it is transferring AND checking.

Yes it does - the disk will be read at that speed on average.

I did have a request for rclone to do the checking phase before the transfer phase - it would be relatively easy to do...

Does this mean that it will slow down at 4 active transfers or when 10000 (backlog) transfers are scheduled?
In my example the total scheduled for the full task was 184005 Checks, 7 transferred, 6 deleted
The weird thing is that while Rclone slows down at some point, as you said. In my example the CPU usage was not affected. You would assume that slower checking would mean lower CPU usage

Of course, it reads only as much as it can send but does it actually limit it or should it be able to read faster (above bwlimit) if that was necessary?

When the 10,000 transfers are scheduled I think.

I think (as you said) it is the transfers that are taking the CPU - the checking probably doesn't take a great deal.

Sure it can read faster to fill up buffers etc.

Transfers take very little CPU time. After Checking has finished then CPU usage drops to 0.5-1%

What do these numbers mean?
image
The left number is google drives check and the right one the local drive check? It's hammering against the 10000 backlog limit and the HDD's are barely doing anything.
Tried adding --max-backlog 200000. It made the right number shoot up like a rocket. The left stayed the same.
Removed --bwlimit. made no change in the speed of the left number. The moment the transfers start checking starts slowing down

  1. Left number goes really fast (2 seconds) from 0 to ~35000. Then it encounters 2 transfers (out of 4 available transfer slots)
  2. The transfers start and the left checking number increases at ~2000 p/s. But getting slower and slower.
  3. After a minute it only increases by ~200 p/s
  4. After a couple minutes of transferring the transfers start getting slower.
  5. After 5 minutes it only increases by ~100 p/s
  6. After 8 minutes it only increases by ~80 p/s and transfers are at 2/3 of available bandwidth (still only 2 transfers)
  7. After 14 minutes it only increases by ~70 p/s and transfers are at 1/2 of available bandwidth (still only 2 transfers)
  8. The CPU usage stays at 12.5% (one total core) all through this. Disk usage never went higher then 10%

sidenote: I doubled all the left checking numbers from what i saw in --progress. As --progress shows at half second intervals

How long the checking takes really depends on when it encounters anything to transfer. Is it quite soon like the above or more to the end which were the screenshots in earlier posts.

Well i guess you get the point with this behavior.

You can slow all this down with the tps-limit flag. That's the number of transactions per second.

This means the number of checks complete over the number of checks it knows needs to happen. It will attempt to keep the number you've specified in backlog based on the number you've specified (10000 be default). The Right number will increase over time and the left as it actually checks them.

For your issue, I think you'd likely benefit from tps-limit and a lower backlog.

Yeah i know that much. Thats logical. Rclone needs to find out how many checks it needs todo and how many its done. But what is its basis. Is the right number based on files it found locally and the left number how many files it has checked on GDrive?.

The idea is that if i run into the backlog limit, the left number is not increasing fast enough to keep up with the right. That would mean that Rclone and GDrive are not communicating fast enough. Which would mean increasing tps-limit and various other pacer limits might be necessary. On the other hand because CPU-usage stays the same while performance overal drops, it might just be the CPU equivalent of a memory leak... if that makes any sense

The left number is checks yet to be done in the queue, the right number total number of checks to be done (that rclone has found so far).

I think you must be out of Disk IO if the transfers are impacting the checkers so much.

Try this with the new --check-first flag which will run all the checkers first before starting the transfers and see if that helps!

https://beta.rclone.org/branch/v1.51.0-326-g791f879a-fix-check-first-beta/ (uploaded in 15-30 mins)

Below is all tested on this version https://beta.rclone.org/branch/v1.51.0-326-g791f879a-fix-check-first-beta/

Disk utilization is really low (below 10%) according to Task manager.
Besides thanks to running it so often most of the data is cached anyways so disk access should be almost non-existent.

Finally the latest changes were synced. Now i could finally test "dry-run" against "non-dry-run without changes". So i have three situations

  • --dry-run: 2-3 minutes
  • Normal run where transfers are encountered: depending on how soon the transfers are found checking can take anywhere from 9 to 25 minutes
  • Normal run where no transfers are found: 2-3 minutes

So there is no difference between the dry-run and a normal run where nothing needs to be transferred.

But now the kicker. Tested your new version. It fixes all issues. I don't even have to use the --check-first flag. So copied it to my Rclone production directory and my issue was back.

So WTF... So i went on a testing binge

  • Cleared Compatibility assistant. That automatic thing can really $%^$% you over
  • Checked priorities of the process and see if that made a difference. Windows Task Scheduler runs tasks at Below Normal priority while running a batch file manually runs it at normal priority

Neither fixed the difference between the way production and test work.
So checked the differences between the test environment and production environment.

  • They were testing the same local production data to the same production destination
  • The versions were now the same so no difference either.
  • Rclone configuration file used is also the same
  • The OS and hardware are the same. Well its the same system not 2 different systems so yeah this is obvious
  • Compatibility assistant was cleared so no possible side effects there.
  • The only difference was the loglevel parameter and the filename where the log was written to. DEBUG for test environment and INFO for production environment

DEBUG log level works perfectly and fast on 1.51 and 1.51.0-326-g791f879a-fix-check-first-beta.
INFO log level has the issue we are discussing here
NOTICE shares INFO's issue

So my question is. Why are INFO and NOTICE log levels not working at full speed while DEBUG is?
I would have thought DEBUG would have affected performance more then NOTICE and INFO. So really surprised about this result.

I also tested the --check-first flag. It fixes the problem as well

Me too...

The debug flag affects the logging and the logging only. So if you set the logging flag then rclone will do more log calculating the logs and more IO writing them out to the log file.

Does that work with DEBUG and without debug? If so then I'll merge the fix!

Did not test with DEBUG
I tested it with INFO

So did a quick test with DEBUG and
--check-first does not work as intended with DEBUG. Transfers start before checking is finished
It does work as intended with INFO. Transfers do not start unless checking is fully finished

This is based on what --progress shows me.
I'm surprised here as well since if log level only affects the logging. Why does it affect --check-first then?

Both DEBUG and INFO showed in the logs

2020/05/16 18:55:30 INFO  : Encrypted drive 'cryptGDrive:D': Running all checks before starting transfers

That is suprising! Can you send me the log with -vv of this happening?

Might be the same reason why there is also a performance difference

Thanks for the log.

I can see I made a mistake with the --check-first flag - it was starting the transfers as soon as the file system traversal was complete, not when the checks were complete.

This should fix it - can you have a try?

https://beta.rclone.org/branch/v1.51.0-351-g9f07cf85-fix-check-first-beta/ (uploaded in 15-30 mins)

Works perfectly both with INFO and DEBUG

2020/05/20 22:22:13 INFO : Encrypted drive 'cryptGDrive:D': Checks finished, now starting transfers
was shown before sending first chunks

Thank you

Thanks for testing :slight_smile:

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.52