#### What is the problem you are having with rclone?
When validating checksums for files that have been copied across to a remote (Microsoft OneDrive), the command is running well over 3 hours for a small number of files and directories.
The file sizes range between 200MB and 400MB. There are approximately 2,566 Files and 96 directories. The process also appears to result in increased CPU & memory utilization.
As it renders the device unusable, the process is stopped prior to completing the validation.
#### What is your rclone version (output from rclone version)
rclone v1.53.1
#### Which OS you are using and how many bits (eg Windows 7, 64 bit)
Microsoft Windows 10 Professional version 1909 Build 18636.1110
#### Which cloud storage system are you using? (eg Google Drive)
Microsoft OneDrive
#### The command you were trying to run (eg rclone copy /tmp remote:tmp) rclone.exe check --stats 10s --progress --log-file=rclone_log.txt --log-level DEBUG /path/to/files/on/local/hard/drive remote:/path
#### The rclone config contents with secrets removed.
The configuration is standard i.e. it follows the Microsoft OneDrive rclone guide - https://rclone.org/onedrive/. There is no encryption configured.
#### A log from the command with the -vv flag
There is no additional information in the log files apart from a successful matching hash e.g.
2020/10/13 14:22:20 DEBUG : [REDACTED]: SHA-1 = 251939df1fb2ef1ae0372abbe480e8be51308780 OK
or a missing file such as;
2020/10/13 17:10:29 ERROR : [REDACTED]: File not in One drive root '[REDACTED]
If you are on a slow HDD then reducing --checkers to --checkers 1 might help so rclone reads each file sequentially.
While rclone is running if you open task manager - take a look at CPU and Disk usage. Doing checksums is both CPU and Disk intensive - but one might be worse than the other.
@ncw, thanks for the suggestion. Is there an option to include CPU & memory utilization metrics in the logs for each operation as to identify the bottleneck?
@Animosity022, the file size ranges between 200MB - 400MB and there are approximately 2,000 files which would equate to approximately 35 minutes. In this instance, after 3 hours, the validation hadn't been completed and since the device was unusable, the process had to be stopped.
Math is not quite that simple as if you are running multiple checkers as @ncw noted, you can be over utilizing your hard drive and making it take longer.
Depending on what the system is doing, you'd have to make adjustments and you may want to go lower.
I would image if CPU & memory IO are affected by checksum validations, --use-nmap would assist in constraining the available memory allocation for instance as would --buffer-size.
Checksums are primarily CPU as it is calculating a value for the file.
use-mmapis how rclone handles cleaning up memory and does well on low memory systems.
buffer-size is what is kept in memory and read ahead when a file is request sequentially before it's closed.
@ncw, the suggestion to reduce the checkers to 1 helped reduce the CPU and memory utilization. It still took a considerable time to validate the checksum though. It took approximately 3 hours to validate 2500 files with 54 errors (which I am unsure how best to address. Does running the command again only limit it to the errors or does it attempt to process all of them)
I'll continue tweaking the option to find a suitable value. It
Silly question, how do I know if I'll need it both --buffer-size and --use-nmap? For example, what are the scenarios that it should be used if doesn't affect checksum validation?
That is better! You can try increasing --checkers until the performance drops off.
Running the sync again will address the errors. However it would be worth while working out why you got the errors first as checksum errors are quite unusual.
What kind of files are they? Are they office type files, or images? Onedrive has had problems with both of those in the past.
If not and If I had this on my computer I would first run memtest86 to check the RAM on my computer was OK.
Thanks @ncw. I'll give increasing --checkers a go.
Having had the look at the logs, it seems that if a file is missing it is considered an ERROR e.g. 2020/10/14 17:09:33 ERROR : Overview.mp4: File not in One drive root 'Media'
The files are media e.g. MP4.
2020/10/14 17:22:56 NOTICE: One drive root 'Media: 53 files missing
2020/10/14 17:22:56 NOTICE: One drive root 'Media': 53 differences found
2020/10/14 17:22:56 NOTICE: One drive root 'Media': 53 errors while checking
Although it isn't clear why it's reporting 54 errors i.e. 2020/10/14 17:22:56 Failed to check with 54 errors: last error was: 53 differences found and Errors: 54 (retrying may help). How do I find out the 54th error as filtering on the log only returns 53?
Is it possible to to generate a list of files so that if commands such as copy, sync, checksum, etc are run, it skips files that have been successfully processed?