Running rclone on Windows to mount Azure Data Lake Storage Gen2 enabled BLOB. BLOB is in Hot Tier. When mount drive on windows, can delele files but can not delete empty folders. Folder is emptied in cache but is still on Blob and will come back on next Blob mount. Under blob is folder "afolder", trying to delete "bfolder" until it.
NOTE: had same problem with blobfuse on linux but they added a flag --user-adls which fixed it.
What is your rclone version (output from rclone version)
Found this on version 1.43 but also see it on v1.55.1.
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Windows 10 Azure VM 64 bit.
Which cloud storage system are you using? (eg Google Drive)
Azure Storage Account BLOB - Data Lake Gen2
The command you were trying to run (eg rclone copy /tmp remote:tmp)
rclone -vv --log-file-Purgedump.txt purge user7:user7/afolder/bfolder
-Also-
rclone -vv --log-file-MNTdump.txt mount user7:user7 c:\utils\test and Explorer to delete folder.
Also tried purge command with --azureblob-archive-tier-delete for spits and wiggles
The rclone config contents with secrets removed.
[user7]
type = azureblob
sas_url = https://BLOBNAME.blob.core.windows.net/user7/?sv=DATE&sr=c&sig=SASSIG&st=TIMESTAMP&se=TIMESTAMP&sp=rwdl
A log from the command with the -vv flag
2021/04/30 14:09:50 DEBUG : Using config file from "C:\\Users\\USERNAME\\.config\\rclone\\rclone.conf"
2021/04/30 14:09:51 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "--log-file=C:\\utils\\tshoot\\Purgedump.txt" "purge" "user7:user7/afolder/bfolder"]
2021/04/30 14:09:51 DEBUG : Creating backend with remote "user7:user7/afolder/bfolder"
2021/04/30 14:09:51 DEBUG : Waiting for deletions to finish
2021/04/30 14:09:51 DEBUG : Azure container user7 path afolder/bfolder: Removing directory
2021/04/30 14:09:51 DEBUG : 4 go routines active
2021/04/30 14:13:18 DEBUG : Using config file from "C:\\Users\\USERNAME\\.config\\rclone\\rclone.conf"
2021/04/30 14:13:18 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "--log-file=C:\\utils\\tshoot\\Purgedump.txt" "--azureblob-archive-tier-delete" "purge" "user7:user7/afolder/bfolder"]
2021/04/30 14:13:18 DEBUG : Creating backend with remote "user7:user7/afolder/bfolder"
2021/04/30 14:13:18 DEBUG : user7: detected overridden config - adding "{+GSw4}" suffix to name
2021/04/30 14:13:18 DEBUG : fs cache: renaming cache item "user7:user7/afolder/bfolder" to be canonical "user7{+GSw4}:user7/afolder/bfolder"
2021/04/30 14:13:18 DEBUG : Waiting for deletions to finish
2021/04/30 14:13:18 DEBUG : Azure container user7 path afolder/bfolder: Removing directory
2021/04/30 14:13:18 DEBUG : 4 go routines active
Hiya Nick,
They were created from user using Windows File Explorer. From the Windows box using the rclone mount.
Right click “New Folder” (afolder), then double-click into afolder and do another New Folder for bfolder. Then went into that and created a text file. Then tried to delete bfolder using File Explorer.
File went away forever but bfolder doesn’t get taken from BLOB. It disappears from the rclone cached version on the Windows mount. But it comes back on a rclone remount.
ADLS has real directories, and this is one such (as presumably indicated by the X-Ms-Meta-Hdi_isfolder header, which doesn't appear to be documented). When a directory is being deleted the recursive parameter needs to be specified, and it's not exposed in the azure-storage-blob-go package because it's Data Lake-specific. The most straightforward solution (beyond requesting that feature be added, which I'll do if it's not already done) may be to manually call the DELETE method using the Do method on the pipeline.
(I'm an MS employee and do storage-adjacent things but I have no idea how the internals of ADLS work beyond looking at basic diagnostic logs.)
Rclone already knows about the X-Ms-Meta-Hdi_isfolder header as you can see here
However what rclone does not do is delete these objects when the directory is removed.
Rclone won't be creating these directories either - I guess they are being created automatically on Azure Data Lake - is that correct?
A fix would be for rclone to delete these directory marker objects (which I guess might be real directories under the hood in ADLS). This would be straightforward to add but would add overhead to the non ADLS case so might need to be controlled by a flag, or maybe not since rmdir isn't a very common operation.
I'll just note that S3 has this problem too with applications like Cyberduck creating directory markers which rclone doesn't delete
I'm not sure I understand this - in azureblob terms the directory is blob, so calling DELETE on this blob with the special recursive parameter will delete it and everything under it?
They should only be created if the resource parameter is present and set to directory; if it's absent the PUT request should be handled as a Put Blob command, so since Rclone doesn't pass that it won't create directories.
Everything in Azure Storage is streams under the hood. If you haven't read it yet there's a paper from 2011 with the details.
recursive is a boolean parameter; true deletes everything under it, false only deletes if it's empty, and if it's not there it should (which is a load-bearing word) throw an error when the object is actually a directory regardless of contents.
I'm going to have to do some research to confirm all these "should"s.
I guess the question is what happens if rclone PUTs dir/subdir/file.txt where dir and dir/subdir don't exist as directories. Do they get automatically created?
I'm guessing they do (that explains the OPs problem) but it would be nice to have confirmation.
Not sure if this means rclone isn't passing any request off to Azure backend or what. If I do a delete of a whole non-empty folder, only the files show up as "DeleteBlob" operations. Folder skeleton is taken out of cache copy but not touched in Blob Land.