Syncing directories to Azure Blob Archive Tier


#1

Hey RClone forum… Perhaps you can help me answer a question I can’t seem to find documentation on.

I have a directory on my home NAS (FreeNAS server) of raw camera footage that I currently rclone (sync) to an Azure Blob (Cool Tier). The footage in the NAS directory is:

  • Constantly being added
  • Never modified
  • Rarely deleted (deletions are singular files like unneeded b-roll)
  • RCloned (sync) to Azure Blob as a backup of my NAS raw footage directory.
  • Currently sitting at 5TB and growing

Given that I’m using Azure Blob as a backup, I’d love to move my files from Blob Cool Tier to Blob Archive Tier as I personally have no need to read the data (outside of catastrophic failure of the NAS).

What is unclear to me, is how RClone and Azure behave when syncing directories between a NAS and Azure Blob that have NAS files in Archive Tier. Azure documentation makes it very clear that the Archive Tier data is unreadable but the Archive Tier meta-data is readable… However I can’t find the documentation on exactly what meta-data is available and if that’s the meta data RClone needs to do a sync. This leaves me in an unknown state of how RClone/Azure will behave when trying to sync my NAS Directories to Azure (with the greatest fear of waking up to a massive bill from Azure for hot reads of Archive Tier data).

Can anybody provide guidance here?


#2

I believe that as far as rclone is concerned Archive blobs will appear like any other blobs. So the metadata rclone needs (checksum, last modified time and size) will be present.

I suggest you perform a small experiment in a new bucket if you are worried about the costs.

Maybe we should update the docs a bit more?


#3

I think here I am concerned (and obviously need to do my own small scale testing) is that I EXPECT rclone to use Azure’s checksum when validating the files in the existing Azure Blob (which the rclone documentation supports?).

For some reason, I have a large spike in ‘Hot Block Writes’ every time RCLONE runs even if I’m uploading a menial amount of data… Which makes me think that rclone is doing file reads in Azure instead of maybe using the existing file hashes to validate the directory?

Azure usage: http://prntscr.com/m2p8go


#4

Yes rclone uses Azure’s checksum for validation. You can see the checksums with rclone md5sum bucket:path

I don’t think reading the checksums is it. It might be something else though!

Can you post a log with -vv that might shed some light?