As a rsync beginner, I am not sure if this is a bug or a normal behavior of my setup.
I am doing “sync” of borgbackup repos from my Linux PC to B2 Backblaze. My internet connection speed is 50 mbps (upload and download). The source directory is a networked drive at my home, mounted as CIFS. The connection speed in my home network is 100 mbps (i.e. 2x faster than my outside connection).
When I start uploading, a few small files are always uploaded first and then rclone starts uploading big files. See the attached screenshot:
The 4 upload threads stall at zero progress, while outbound internet connection stays contant at about 0.05 mbps (i.e. almost stopped) and inbound internet connection is at 100 mbps (i.e. maximum from my local networked storage). This situation keeps constant for about 4 - 5 minutes, while all upload threads show clean zeroes. After 4-5 minutes, the upload starts at full speed (50 mbps) and finishes correctly. The delay is ALWAYS 4-5 minutes, regardless of the size of the uploaded repository (e.g. 1 GB vs 25 GB, the initial delay stays the same).
Is this normal amd caused by my source drive being on the network? Can I speed things up using some configuration options?
Yes, that would be it! The files are 500 MB each and those 4-5 minutes correspond exactly to the time it takes running through 4 x 500 MB (for 4 upload threads).
If I disable the checksumming, what negative implications would it have? Would the file equality check be less reliable? It’s not much of a problem leaving it as is because it only happens when there are hundreds of megabytes of new data in the repo, which is rarely the case.
It would mean that B2 no longer knows the sha1sum of your file, so when you come to download it, rclone can’t check the integrity, or when you run rclone check rclone can only check existence, not the sha1sums.
By default rclone will use size and modified time to check file equality. However if you are using the --checksum flag then it will make a difference, rclone will effectively just be checking the size of the files.
I think if it was me, I’d leave the checksums on as I like data integrity checks, but your use case may be different.
If you’d like us to have a go at the --b2-disable-checksum flag then please make a new issue on github about it.
I will stay with the current config, thanks. My configuration is rather non-standard and now when I know what’s happening I can flexibly work around the “problem” without losing the benefit of checksums.
By the way, is it OK that “rclone cleanup” does not remove partially uploaded big files (from B2)? Is there a way to use rclone to remove these? Sould I make separate thread about this?
You can do it manually via the website but you have to do it one file after another. At least I didn’t find any “remove all partial uploads” button. I don’t know about the expiration - they certainly didn’t expire 24 hours after aborted upload.
I have sent the docs PR with the addition. If by “work on that” you mean actual programming, I am not fluent in Go, unfortunately. I cannot even promise to test beta versions because I only use rsync on my main server where I cannot easily experiment with different versions.
Could you please make a new issue on github with the idea of getting rclone cleanup to clear up the half uploaded files? Then hopefully we can find someone to work on it