How do I get the md5sum for large tar files uploaded to Azure?


I want to upload some large tar files from a local disk to Azure Blob Storage. I’ve noticed the ‘rclone md5sum blobstorage:container’ command only seems to report the md5sum for files uploaded which are less than a few hundred megabytes in size (the threshold seems to be between 240 and 500 MB). Is this a limitation in the Azure API? Is there an extra parameter I need to use when uploading the files? Are there any other mechanisms in rclone to check for file corruption whilst uploading large files? I can reproduce the issue under rclone 1.46, using either a Linux-based 64-bit Intel computer or a Linux-based ARMv7 processor.


I’m pretty sure files uploaded using the multpart method don’t have an md5sum

  --azureblob-upload-cutoff SizeSuffix   Cutoff for switching to chunked upload (<= 256MB). (default 256M)

Unfortunately you can’t set it higher so any files over 256MB won’t have an md5sum.

Each chunk is MD5SUMmed as it is uploaded so rclone is guaranteeing the integrity of those and rclone supplies a list of the blocks at the end to azure so we can be sure that Azure got the same number of blocks as rclone sent.

S3 has the same problem, and there, what we did was add an extra bit of metadata with the checksum for large files. This is metadata only though - it would be calculated while reading the file and stored in the blob metadata.