Hi vicmarto,
I will look into the logs, can you meanwhile test these flags (on the latest version):
Hi vicmarto,
I will look into the logs, can you meanwhile test these flags (on the latest version):
@vicmarto In your earlier post you wrote:
however the command your are executing:
2023/01/23 20:03:41 DEBUG : SOME_FILE_IN_SOURCE :Version "v1.61.1" starting with parameters ["/usr/local/bin/rclone" "sync" "--copy-links" "--create-empty-src-dirs" "--track-renames" "--local-no-check-updated" "--progress" "-vv" "--log-file=/Users/vicmarto/rclone_1.61.1.log" "--exclude" "SOME_PATH" "/Volumes/Users/vicmarto" "nas:/mnt/zbackup/Users/vicmarto"]
has this target: "nas:/mnt/zbackup/Users/vicmarto" which seems to be an SFTP backend. Could you please post the redacted output from rclone config show nas:
to make it clear exactly what your target is?
Is /mnt/zbackup/Users/vicmarto a local disk on the SFTP server or something mounted? What? How?
EDIT: I now see you posted this:
So you are referring to local network transfers, not local machine/disk transfers like the original post:
@Ole:
rclone.conf is:
[nas]
type = sftp
host = 10.0.1.150
user = vicmarto
key_file = /Users/vicmarto/.ssh/id_rsa_nas
key_file_pass = EDITED
md5sum_command = md5 -r
sha1sum_command = sha1 -r
shell_type = unix
That's true: transfer is to a SFTP backend.
Thanks @vicmarto !
I am looking in the log from rclone 1.61.1 which starts 20:03:41 and has an abrupt end after 67 minutes at 21:10:35. At what time starts the 70 seconds where CPU rises to 200%?
Were there any ongoing transfers at that point or just checks (with checksum calculations)? (I cannot see in the log)
Yes, I forced to cancel the 11 transfers, due to its extreme slowness.
The issues seem to appear when the rclone checkers performs a series of md5 calculations to see if files have changed. Perhaps (some) Mac's can be overloaded by doing many concurrent md5 calculations.
Are you able to observe something similar when executing this command:
rclone check "/Volumes/Users/vicmarto" "/Volumes/Users/vicmarto" "--exclude" "SOME_PATH" "--progress" "-vv" "--log-file=/Users/vicmarto/rclone_check.log"
It will stress the Mac by calculating and comparing checksums for all the files in /Volumes/Users/vicmarto. It is intentional that I compare the files with themselves, that will eliminate the network and SFTP server - and increase load.
I ran the same test again with v.1.61.1 after upgrading to macOS 13.2 Ventura. Here is the log (modified).
In summary, the sequence would be:
I have the feeling that the problems start in the md5 calculation phase (but this is not certain).
Thanks @Ole for your help.
That "rclone check" test seems to be fine. I ran it for about an hour with a constant 200% CPU usage, as expected.
Thanks @vicmarto both tests are very useful.
I am not sure you see the same issue as the others. What you see could also be related to a networking issue, so @wdp and @Marcelloh please chime in if you have supporting or different observations.
Now, let's try narrowing down the possibilities again:
What happens if you start your command with --check-first which will make rclone complete all the checks before starting the transfers?
If --check-first makes it
--check-first --checkers=12 --transfers=12
to increase concurrency.--check-first --checkers=1
to reduce concurrency during checking.--check-first --transfers=1
to reduce concurrency during transfers.Unfortunately, I don't see any major change using '--check-first': at some point the CPU usage of rclone drops to 1% and the transfer process takes forever...
Right now we are not trying to solve the issue, we are trying to locate the cause of the issue to be able to (hopefully) solve it.
Does it happen while checking or transferring?
Hi @Ole, I'll try it another time and report back.
Something that happens often and I would like to report, is what it seems, difficulties in completing the files when reaching 100%:
rclone calculates and compares the checksums after transfer, which can give this picture if one (or both) remotes takes some time to calculate it.
You could try adding --ignore-checksum to prove/disprove this. Please be aware of this in the docs:
You should only use it if ... you are sure you might want to transfer potentially corrupted data.
You could also try executing 4 concurrent "md5 -r" commands on your SFTP server on the above files, to see how fast your SFTP server can do it. And similarly on your Mac.
Maybe the problem is in the SFTP server and rclone is working fine?!
The Mac can calculate the four md5 in 70 seconds. And the server needs 4 minutes and 20 seconds...
Yes, that is one out of several open possibilities. We need more tests/observations to say for sure.
Can you tell if the manual md5 calculations on the server is limited by CPU, disk or something else? Which is (almost) at 100% usage?
Perfect, some things to pay attention to during the test:
First of all, is it the checking or the transfers that are slow? Does the speed vary during each of the phases? If so please note the time when good and bad speed, so we can compare to the activities we see in the debug log.
Secondly, can you identify the limiting factor during each of the two phases? Is it Mac CPU, Mac Disc, Network speed, Server CPU, Server Disk, or something else?
Thanks @Ole for your help.
Yes, after some performance tests, it seems that the limiting factor is in the SFTP server. Specifically, in its hard disk. While the Mac can read at about 1200 MB/s, the SFTP server can only read at 350 MB/s.
In terms of MD5 "read + compute", the Mac slightly drops its performance to 1060 MB/s and the SFTP server to 295 MB/s.
With this, I think we can conclude that, in the picture shared before (with transfers completed over several minutes and CPU usage on the Mac dropping to 1%):
Looks like that's not a rclone freeze, but is the Mac waiting for the SFTP server to finish the md5 calculations before it can continue. I have monitored the server with "top" in those instants, and that's what seems to be happening. Do you agree?
On another note, and back on topic, yesterday, after a long time without happening, I have again experienced a "Finder" freeze during a rclone transfer. At first glance, it's just the Finder freezing, but it's hard to tell what's really happening on the Mac. Although, it is definitely not a system freeze, nor a crash, as it is possible to restart the computer normally.
This is more like, but not the same as, what @wdp experienced.
This is the log during the Finder freeze (macOS 13.2 Ventura).
Wow, this became far more active than I expected. Thank you all.
As much fun as it was crashing my laptop on purpose, I had to get some work done and haven't had a chance to get back to replicating the issue yet. I am really impressed with the rclone community though. This is amazing to see. Normally people just say I'm using the tool wrong and to read the manual. I was a few days in on troubleshooting before I even made it to the forum.
Hopefully I will have some time this weekend to go back through the suggestions and get more specific information to you. I do know I can replicate it frequently though. Sometimes, very rarely, the transfer makes it.
Yes I do, great testing!
I suggested trying the hasher in this post, I guess it can take much of the load from the SFTP server - especially while checking.
An ordinary rclone sync like yours should not be able to make the Finder freeze unless there was a major memory leak in rclone, but we already know the rclone memory usage is within expectations and low.
My immediate thought is therefore that the sync is pushing some (unstable) part of the Mac OS over the edge. It could be anything, but my immediate suspicion in your setup would be the something related to networking, e.g. your network driver or high packet loss on your network. This is because you already tested that high concurrent CPU and disk usage didn't trigger any issues while trying rclone check
.
I therefore suggest you start ping IP.OF.YOUR.NAS
in a sperate terminal during your sync and then monitor it up to and during a freeze.
Does your Finder show networked drives e.g. from the NAS?
Good to hear from you.
I also suspect your issue is related to the SMB driver, network driver or network being pushed hard by rclone.
Here is a little test what will probably run without issues:
rclone check ~/Desktop/temp_photo/ ~/Desktop/temp_photo/
and here is another that may provoke the issue:
rclone check /Volumes/Illmatic/ /Volumes/Illmatic/
Like rapid arm movements/seeks? Could there be a bad spot on your disk?
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.