We are trying to sync the output of an application up to azure blob storage. The application will write out its data file (several hundred MB size), and then once it has finished writing that file it will write out a marker file (a few bytes size), differentiated by its extension, to indicate that the main payload was successfully written and is now available for processing.
Our consuming application is looking for the marker files, and if one exists it starts processing the main file. My concern is that the sync process may end up transferring the marker file before it has completed sync'ing the main file. Is this a risk? Is there any way to force the ordering such that it will definitely not transfer the marker file until the main file upload is completed? The marker file will have a later mod time to the main file, but they will be very close.
We are currently trying the command
rclone sync diskpath destremote:path --order-by 'modtime'
I initially tried forcing this order with scripting and copying files individually using "rclone copyto" but this was prohibitively slow and negated most of the benefits of rclone.
If there is no way to negate this risk, can I be confident that the upload will be atomic? i.e. if the marker file arrives first, but the other file is not readable until it has fully transfered, this will also be survivable. The main problem would be if the main file partially uploads then the marker file uploads completely and thus we process an incomplete file. The source application appears to maintain a lock on the file while it's writing it, as one reason we're moving to rclone is that azcopy would replicate the file as a 0 byte file, so the only risk there is that the upload process might be partial.
Microsoft Azure Blob Storage