Does rclone support Azure Data Lake Gen2?

What is the problem you are having with rclone?

Running rclone on Windows to mount Azure Data Lake Storage Gen2 enabled BLOB. BLOB is in Hot Tier. When mount drive on windows, can delele files but can not delete empty folders. Folder is emptied in cache but is still on Blob and will come back on next Blob mount. Under blob is folder "afolder", trying to delete "bfolder" until it.

NOTE: had same problem with blobfuse on linux but they added a flag --user-adls which fixed it.

What is your rclone version (output from rclone version)

Found this on version 1.43 but also see it on v1.55.1.

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 10 Azure VM 64 bit.

Which cloud storage system are you using? (eg Google Drive)

Azure Storage Account BLOB - Data Lake Gen2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone -vv --log-file-Purgedump.txt purge user7:user7/afolder/bfolder
-Also-
rclone -vv --log-file-MNTdump.txt mount user7:user7 c:\utils\test   and Explorer to delete folder.
Also tried purge command with --azureblob-archive-tier-delete for spits and wiggles

The rclone config contents with secrets removed.

[user7]
type = azureblob
sas_url = https://BLOBNAME.blob.core.windows.net/user7/?sv=DATE&sr=c&sig=SASSIG&st=TIMESTAMP&se=TIMESTAMP&sp=rwdl

A log from the command with the -vv flag

2021/04/30 14:09:50 DEBUG : Using config file from "C:\\Users\\USERNAME\\.config\\rclone\\rclone.conf"
2021/04/30 14:09:51 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "--log-file=C:\\utils\\tshoot\\Purgedump.txt" "purge" "user7:user7/afolder/bfolder"]
2021/04/30 14:09:51 DEBUG : Creating backend with remote "user7:user7/afolder/bfolder"
2021/04/30 14:09:51 DEBUG : Waiting for deletions to finish
2021/04/30 14:09:51 DEBUG : Azure container user7 path afolder/bfolder: Removing directory
2021/04/30 14:09:51 DEBUG : 4 go routines active
2021/04/30 14:13:18 DEBUG : Using config file from "C:\\Users\\USERNAME\\.config\\rclone\\rclone.conf"
2021/04/30 14:13:18 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "--log-file=C:\\utils\\tshoot\\Purgedump.txt" "--azureblob-archive-tier-delete" "purge" "user7:user7/afolder/bfolder"]
2021/04/30 14:13:18 DEBUG : Creating backend with remote "user7:user7/afolder/bfolder"
2021/04/30 14:13:18 DEBUG : user7: detected overridden config - adding "{+GSw4}" suffix to name
2021/04/30 14:13:18 DEBUG : fs cache: renaming cache item "user7:user7/afolder/bfolder" to be canonical "user7{+GSw4}:user7/afolder/bfolder"
2021/04/30 14:13:18 DEBUG : Waiting for deletions to finish
2021/04/30 14:13:18 DEBUG : Azure container user7 path afolder/bfolder: Removing directory
2021/04/30 14:13:18 DEBUG : 4 go routines active

Hmm, folders don't really exist on blob storage.

How did those folders get made - did rclone make them - or some other tool?

Hiya Nick,
They were created from user using Windows File Explorer. From the Windows box using the rclone mount.

Right click “New Folder” (afolder), then double-click into afolder and do another New Folder for bfolder. Then went into that and created a text file. Then tried to delete bfolder using File Explorer.

File went away forever but bfolder doesn’t get taken from BLOB. It disappears from the rclone cached version on the Windows mount. But it comes back on a rclone remount.

Carl

Can you do

rclone -vv lsl user7:user7/afolder/bfolder -vv --dump bodies --log-file lsllog.log

And post the log file? That will let me see what has happened hopefully!

OK, log file of rclone lsl -

2021/05/03 14:37:51 DEBUG : Using config file from "C:\\Users\\carls-d-72084\\.config\\rclone\\rclone.conf"
2021/05/03 14:37:51 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "lsl" "user7:user7/afolder/bfolder" "-vv" "--dump" "bodies" "--log-file" "c:\\utils\\lsllog.log"]
2021/05/03 14:37:51 DEBUG : Creating backend with remote "user7:user7/afolder/bfolder"
2021/05/03 14:37:51 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/05/03 14:37:51 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:37:51 DEBUG : HTTP REQUEST (req 0xc000658400)
2021/05/03 14:37:51 DEBUG : HEAD /user7/afolder/bfolder?se=BLAHBLAH&sp=rwdlBLAH&timeout=31536001 HTTP/1.1
Host: uncbppersdev3.blob.core.windows.net
User-Agent: rclone/v1.55.1
X-Ms-Client-Request-Id: 00b8fc1c-1400-4d59-5a3e-b5d211d7212e
X-Ms-Version: 2019-12-12

2021/05/03 14:37:51 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:37:51 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:37:51 DEBUG : HTTP RESPONSE (req 0xc000658400)
2021/05/03 14:37:51 DEBUG : HTTP/1.1 200 OK
Accept-Ranges: bytes
Date: Mon, 03 May 2021 14:37:51 GMT
Etag: "0x8D9097B389BDE16"
Last-Modified: Tue, 27 Apr 2021 12:51:48 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Access-Tier: Hot
X-Ms-Access-Tier-Inferred: true
X-Ms-Blob-Type: BlockBlob
X-Ms-Client-Request-Id: 00b8fc1c-1400-4d59-5a3e-b5d211d7212e
X-Ms-Creation-Time: Tue, 27 Apr 2021 12:51:48 GMT
X-Ms-Lease-State: available
X-Ms-Lease-Status: unlocked
X-Ms-Meta-Hdi_isfolder: true
X-Ms-Request-Id: 45a87a79-b01e-0023-2d29-4045ac000000
X-Ms-Server-Encrypted: true
X-Ms-Version: 2019-12-12
Content-Length: 0

2021/05/03 14:37:51 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:37:51 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:37:51 DEBUG : HTTP REQUEST (req 0xc000658f00)
2021/05/03 14:37:51 DEBUG : GET /user7/?comp=list&delimiter=&include=metadata&maxresults=5000&prefix=afolder%2Fbfolder%2F&restype=container&se=BLAH&sp=rwdl&sr=c&st=2020-11-30T15%3A28%3A33Z&sv=2019-07-07&timeout=31536001 HTTP/1.1
Host: uncbppersdev3.blob.core.windows.net
User-Agent: rclone/v1.55.1
X-Ms-Client-Request-Id: c0f6a314-c4f5-4ab3-4e4c-43d11e58bb57
X-Ms-Version: 2019-12-12
Accept-Encoding: gzip

2021/05/03 14:37:51 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:37:51 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:37:51 DEBUG : HTTP RESPONSE (req 0xc000658f00)
2021/05/03 14:37:51 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Mon, 03 May 2021 14:37:51 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Client-Request-Id: c0f6a314-c4f5-4ab3-4e4c-43d11e58bb57
X-Ms-Request-Id: 45a87a89-b01e-0023-3b29-4045ac000000
X-Ms-Version: 2019-12-12

fc
<?xml version="1.0" encoding="utf-8"?><EnumerationResults ServiceEndpoint="https://uncbppersdev3.blob.core.windows.net/" ContainerName="user7"><Prefix>afolder/bfolder/</Prefix><MaxResults>5000</MaxResults><Blobs /><NextMarker /></EnumerationResults>
0

2021/05/03 14:37:51 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:37:51 DEBUG : 4 go routines active

But this command returns nothing. However, if I do a
rclone -vv lsd user7:/user7/afolder -vv --dump bodies --log-file lsdlog.log, the log file is

2021/05/03 14:43:08 DEBUG : Using config file from "C:\\Users\\carls-d-72084\\.config\\rclone\\rclone.conf"
2021/05/03 14:43:08 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "-vv" "lsd" "user7:user7/afolder" "-vv" "--dump" "bodies" "--log-file" "c:\\utils\\lsdlog.log"]
2021/05/03 14:43:08 DEBUG : Creating backend with remote "user7:user7/afolder"
2021/05/03 14:43:08 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/05/03 14:43:08 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:43:08 DEBUG : HTTP REQUEST (req 0xc000636d00)
2021/05/03 14:43:08 DEBUG : HEAD /user7/afolder?se=2021-11-30T15%3A28%3A33Z&sig=BLAH&sp=rwdl&sr=c&st=2020-11-30T15%3A28%3A33Z&sv=2019-07-07&timeout=31536001 HTTP/1.1
Host: uncbppersdev3.blob.core.windows.net
User-Agent: rclone/v1.55.1
X-Ms-Client-Request-Id: 5bb53290-970b-4569-4ee5-680560372efe
X-Ms-Version: 2019-12-12

2021/05/03 14:43:08 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:43:09 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:43:09 DEBUG : HTTP RESPONSE (req 0xc000636d00)
2021/05/03 14:43:09 DEBUG : HTTP/1.1 200 OK
Accept-Ranges: bytes
Date: Mon, 03 May 2021 14:43:08 GMT
Etag: "0x8D9097B389B1D58"
Last-Modified: Tue, 27 Apr 2021 12:51:48 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Access-Tier: Hot
X-Ms-Access-Tier-Inferred: true
X-Ms-Blob-Type: BlockBlob
X-Ms-Client-Request-Id: 5bb53290-970b-4569-4ee5-680560372efe
X-Ms-Creation-Time: Tue, 27 Apr 2021 12:51:48 GMT
X-Ms-Lease-State: available
X-Ms-Lease-Status: unlocked
X-Ms-Meta-Hdi_isfolder: true
X-Ms-Request-Id: e7496e67-501e-002b-5c2a-405fa3000000
X-Ms-Server-Encrypted: true
X-Ms-Version: 2019-12-12
Content-Length: 0

2021/05/03 14:43:09 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:43:09 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:43:09 DEBUG : HTTP REQUEST (req 0xc000637800)
2021/05/03 14:43:09 DEBUG : GET /user7/?comp=list&delimiter=%2F&include=metadata&maxresults=5000&prefix=afolder%2F&restype=container&se=2021-11-30T15%3A28%3A33Z&sig=BLAH&sp=rwdl&sr=c&st=2020-11-30T15%3A28%3A33Z&sv=2019-07-07&timeout=31536001 HTTP/1.1
Host: uncbppersdev3.blob.core.windows.net
User-Agent: rclone/v1.55.1
X-Ms-Client-Request-Id: ea79c1e2-c3a9-499b-7374-f3c2096a1b58
X-Ms-Version: 2019-12-12
Accept-Encoding: gzip

2021/05/03 14:43:09 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/05/03 14:43:09 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:43:09 DEBUG : HTTP RESPONSE (req 0xc000637800)
2021/05/03 14:43:09 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Mon, 03 May 2021 14:43:08 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Client-Request-Id: ea79c1e2-c3a9-499b-7374-f3c2096a1b58
X-Ms-Request-Id: e7496e75-501e-002b-662a-405fa3000000
X-Ms-Version: 2019-12-12

148
<?xml version="1.0" encoding="utf-8"?><EnumerationResults ServiceEndpoint="https://uncbppersdev3.blob.core.windows.net/" ContainerName="user7"><Prefix>afolder/</Prefix><MaxResults>5000</MaxResults><Delimiter>/</Delimiter><Blobs><BlobPrefix><Name>afolder/bfolder/</Name></BlobPrefix></Blobs><NextMarker /></EnumerationResults>
0

2021/05/03 14:43:09 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/05/03 14:43:09 DEBUG : 4 go routines active

Carl

ADLS has real directories, and this is one such (as presumably indicated by the X-Ms-Meta-Hdi_isfolder header, which doesn't appear to be documented). When a directory is being deleted the recursive parameter needs to be specified, and it's not exposed in the azure-storage-blob-go package because it's Data Lake-specific. The most straightforward solution (beyond requesting that feature be added, which I'll do if it's not already done) may be to manually call the DELETE method using the Do method on the pipeline.

(I'm an MS employee and do storage-adjacent things but I have no idea how the internals of ADLS work beyond looking at basic diagnostic logs.)

Update: there is a new version of the Go API in the works but we don't have a firm date yet. API Stability and Versioning · Issue #47 · Azure/azure-storage-blob-go · GitHub

Ah, that makes sense.

Rclone already knows about the X-Ms-Meta-Hdi_isfolder header as you can see here

However what rclone does not do is delete these objects when the directory is removed.

Rclone won't be creating these directories either - I guess they are being created automatically on Azure Data Lake - is that correct?

A fix would be for rclone to delete these directory marker objects (which I guess might be real directories under the hood in ADLS). This would be straightforward to add but would add overhead to the non ADLS case so might need to be controlled by a flag, or maybe not since rmdir isn't a very common operation.

I'll just note that S3 has this problem too with applications like Cyberduck creating directory markers which rclone doesn't delete

I'm not sure I understand this - in azureblob terms the directory is blob, so calling DELETE on this blob with the special recursive parameter will delete it and everything under it?

They should only be created if the resource parameter is present and set to directory; if it's absent the PUT request should be handled as a Put Blob command, so since Rclone doesn't pass that it won't create directories.

Everything in Azure Storage is streams under the hood. If you haven't read it yet there's a paper from 2011 with the details.

recursive is a boolean parameter; true deletes everything under it, false only deletes if it's empty, and if it's not there it should (which is a load-bearing word) throw an error when the object is actually a directory regardless of contents.

I'm going to have to do some research to confirm all these "should"s.

1 Like

I guess the question is what happens if rclone PUTs dir/subdir/file.txt where dir and dir/subdir don't exist as directories. Do they get automatically created?

I'm guessing they do (that explains the OPs problem) but it would be nice to have confirmation.