Is this normal? When uploading from networked drive, upload always stalls for 5 minutes before beginning

As a rsync beginner, I am not sure if this is a bug or a normal behavior of my setup.

I am doing “sync” of borgbackup repos from my Linux PC to B2 Backblaze. My internet connection speed is 50 mbps (upload and download). The source directory is a networked drive at my home, mounted as CIFS. The connection speed in my home network is 100 mbps (i.e. 2x faster than my outside connection).

When I start uploading, a few small files are always uploaded first and then rclone starts uploading big files. See the attached screenshot:

The 4 upload threads stall at zero progress, while outbound internet connection stays contant at about 0.05 mbps (i.e. almost stopped) and inbound internet connection is at 100 mbps (i.e. maximum from my local networked storage). This situation keeps constant for about 4 - 5 minutes, while all upload threads show clean zeroes. After 4-5 minutes, the upload starts at full speed (50 mbps) and finishes correctly. The delay is ALWAYS 4-5 minutes, regardless of the size of the uploaded repository (e.g. 1 GB vs 25 GB, the initial delay stays the same).

Is this normal amd caused by my source drive being on the network? Can I speed things up using some configuration options?

Thanks

How big are the files? rclone needs to calculate the sha1sum of the files before uploading them to b2 which can take a while for GB sized files.

rclone uses that sha1sum for integrity checking.

It would be possible to disable it which would mean you’d lose hashes on multipart uploaded files. S3 has a similar problem and it has the flag

  --s3-disable-checksum                      Don't store MD5 checksum with object metadata

Yes, that would be it! The files are 500 MB each and those 4-5 minutes correspond exactly to the time it takes running through 4 x 500 MB (for 4 upload threads).

If I disable the checksumming, what negative implications would it have? Would the file equality check be less reliable? It’s not much of a problem leaving it as is because it only happens when there are hundreds of megabytes of new data in the repo, which is rarely the case.

Thanks again for the explanation.

:slight_smile:

It would mean that B2 no longer knows the sha1sum of your file, so when you come to download it, rclone can’t check the integrity, or when you run rclone check rclone can only check existence, not the sha1sums.

By default rclone will use size and modified time to check file equality. However if you are using the --checksum flag then it will make a difference, rclone will effectively just be checking the size of the files.

I think if it was me, I’d leave the checksums on as I like data integrity checks, but your use case may be different.

If you’d like us to have a go at the --b2-disable-checksum flag then please make a new issue on github about it.

I will stay with the current config, thanks. My configuration is rather non-standard and now when I know what’s happening I can flexibly work around the “problem” without losing the benefit of checksums.

By the way, is it OK that “rclone cleanup” does not remove partially uploaded big files (from B2)? Is there a way to use rclone to remove these? Sould I make separate thread about this?

OK

There isn’t a way to use rclone to do that at the moment. I think you can do it via the b2 website and don’t they expire after a while?

At minimum this should be in the docs. Fancy sending a PR for the docs?

Getting rclone cleanup to do it is a good idea too - would you like to work on that?

You can do it manually via the website but you have to do it one file after another. At least I didn’t find any “remove all partial uploads” button. I don’t know about the expiration - they certainly didn’t expire 24 hours after aborted upload.

I have sent the docs PR with the addition. If by “work on that” you mean actual programming, I am not fluent in Go, unfortunately. I cannot even promise to test beta versions because I only use rsync on my main server where I cannot easily experiment with different versions.

Thank you - I’ll look at that this evening.

Could you please make a new issue on github with the idea of getting rclone cleanup to clear up the half uploaded files? Then hopefully we can find someone to work on it :smile:

Added as issue #2617

Thank you for making the issue :smile: