Rclone listing fails on folder where a file with the same name exists

I see you have managed indeed, I am impressed. When I mkdir with rclone, I get an error if the file already exists, that is what I meant.

rclone has some quirks about creating empty directories on bucket based remotes, so i have learnt to avoid it.
yeah, recently rclone added --s3-directory-markers but still, i avoid mkdir whenever i can.

rclone backend features azure: | grep "CanHaveEmptyDirectories"
"CanHaveEmptyDirectories": false

I have tried to find a solution that works for both S3 and Azure and, I have found a solution that works for Azure but not S3...

$ rclone lsjson -R --include 'README.md'  's3:rnd/test/'   
[
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T16:00:25.840449000+02:00","IsDir":true}
]

With S3 we miss the aside README.md.

$ rclone lsjson -R --include 'README.md'  'azure:rnd/test/'
[
{"Path":"README.md","Name":"README.md","Size":18503,"MimeType":"application/octet-stream","ModTime":"2024-06-01T12:48:31.549311060+02:00","IsDir":false,"Tier":"Hot"},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"application/octet-stream","ModTime":"2024-06-01T12:48:31.549311060+02:00","IsDir":false,"Tier":"Hot"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T16:00:33.258820000+02:00","IsDir":true}
]

With Azure: all is fine we see the aside README, the folder and the inside README.

This approach works for both S3 and Azure to list the content of the hidden folder (provided you suspect there is a hidden folder):

(env) delahondes@victivallis scitq % rclone lsjson -R --include 'README.md/**' 's3:rnd/test/'
[
{"Path":"README.md","Name":"README.md","Size":0,"MimeType":"inode/directory","ModTime":"2000-01-01T00:00:00.000000000Z","IsDir":true},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/some_other_file.txt","Name":"some_other_file.txt","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T13:38:51.000000000Z","IsDir":false,"Tier":"STANDARD"}
]
(env) delahondes@victivallis scitq % rclone lsjson -R --include 'README.md/**' 'azure:rnd/test/'
[
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2000-01-01T00:00:00.000000000Z","IsDir":true},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"application/octet-stream","ModTime":"2024-06-01T12:48:31.549311060+02:00","IsDir":false,"Tier":"Hot"},
{"Path":"README.md/some_other_file.txt","Name":"some_other_file.txt","Size":5303,"MimeType":"text/plain","ModTime":"2024-10-16T14:10:29.000000000Z","IsDir":false,"Tier":"Hot"}
]

So to conclude as for now: for the masking issue, there is no general workaround unless you systematically replace target/path/to/list by a recursive listing of target/path/to with an --include 'list/**' option.

For the inside/aside confusion: when you list target/path/to/list and you find a unique answer list then you may suspect a hidden folder and apply the previous trick on purpose.

What is frustrating is that it all happens because rclone tries to be too user friendly:

  • adding automatically a missing slash is fine (as it is done by ls, so it is kind of intuitive)
  • removing an extraneous slash is rather bad: it is not done by any other tool that I know (and I'm an old monkey) than rclone and it reduces the expressiveness of rclone URIs.

That plus the fact that as said in the github issue, there is no way to get an absolute path from rclone (I mean absolute to the root configuration, it is always relative to the listed folder).

As stated the problem also exists with Azure, so it is not specific to S3. Also you may consider that S3 is not just Amazon S3, there are a lot of different S3. So I do not think this is a minor issue.

So a good fix would be IMHO [EDITED : THIS IS WRONG SEE BELOW THE TRAILING SLASH SEEMS REMOVED ONLY WITH azure, THE BEHAVIOR WITH s3 IS CORRECT] :

  • not to remove a trailing slash, at least when it yields some content,
  • to make --absolute works with lsjson and provide then values relative to the root and not to the listed folder in the Path attribute.

As this is mostly a script issue, it makes sense to try to restrict the fix to lsjson (that plus the fact that --absolute already exists for lsf, an option which I do not understand, how adding a leading slash would be useful? - but then I understand one would not want to break existing things).

With any remotes that allow duplicates and I don't care to debate minor/major as it's all relative to the person's need so I don't care to judge that :slight_smile:

You are the first person to report so it is not prevalent.

As for the inside/aside confusion issue, I am not alone, see the github issue. Yet I understand it might not be a priority as it is a bit theoritical, but I was wondering if I tried to implement some patch, would the rclone team be willing to integrate it?

i re-did my test on azure and did not get the correct result, same as you.

backend show real content
s3 yes
azure no

do we agree on just that limited conclusion?

Yes absolutely, I tested both tree and ls and both are correct for s3 and broken for azure.

PS I use OVH s3, so it seems to be correct for a variety of s3 (I guess you're not on OVH)
PS2: only the lsjson --include ... seems to work better on azure and is partially broken on s3

Even more so as @ncw is amazing as helping/guiding and assisting on the fix. Probably one of the best folks I've seen in doing that as I aspire to be more like him :slight_smile:

1 Like

as i understand it, that is not s3, as OVH uses openstack and s3 is just a compatibility layer on top of that.
can you post the output of rclone config redacted s3: ?

my s3 test results do not match your test results?

rclone lsjson -R --include 'README.md' s3:rnd/test/
[
{"Path":"/","Name":"/","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T12:43:06.799655587-04:00","IsDir":true},
{"Path":"/README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T12:43:06.799655587-04:00","IsDir":true}
{"Path":"/README.md/README.md","Name":"README.md","Size":1,"MimeType":"text/markdown; charset=utf-8","ModTime":"2024-10-16T12:43:06.522263318-04:00","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":1,"MimeType":"text/markdown; charset=utf-8","ModTime":"2024-10-16T09:09:52.020415057-04:00","IsDir":false,"Tier":"STANDARD"},
]

and without --include 'README.md'

rclone lsjson -R s3:rnd/test/
[
2024/10/16 12:51:20 NOTICE: /README.md/README.md: Failed to read metadata: object not found
{"Path":"/","Name":"/","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T12:51:20.549108326-04:00","IsDir":true},
{"Path":"/README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T12:51:20.549108326-04:00","IsDir":true}
{"Path":"/README.md/README.md","Name":"README.md","Size":1,"MimeType":"text/markdown; charset=utf-8","ModTime":"2024-10-16T12:51:20.424977760-04:00","IsDir":false,"Tier":"STANDARD"}2024/10/16 12:51:20 ,
{"Path":"/README.md/some_other_file.txt","Name":"some_other_file.txt","Size":1,"MimeType":"text/plain; charset=utf-8","ModTime":"2024-10-16T12:51:20.498914748-04:00","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":1,"MimeType":"text/markdown; charset=utf-8","ModTime":"2024-10-16T09:09:52.020415057-04:00","IsDir":false,"Tier":"STANDARD"},
]
[s3]
type = s3
provider = Other
access_key_id = XXX
secret_access_key = XXX
endpoint = https://s3.gra.io.cloud.ovh.net
region = gra

It seems that in one of my message I did test with one of the README.md missing: I am sorry. I rechecked everything this time and I am certain I have the right tree.

So no it is different with OVH s3, but marginally, I only miss the '/' entry you have in either case, and I do not have an error message. Anyway in the end, the recursive listing from parent folder seems ok (the / entry is a difference of small importance, it does not introduce any confusion).

The only broken recursive listing is on azure:rnd/test/README.md/ on azure with a trailing slash (see below). The listing of the parent folder with or without --include README.md is ok on both.

with --include 'README.md'

rclone lsjson -R --include 'README.md' s3:rnd/test/
[
{"Path":"README.md","Name":"README.md","Size":18503,"MimeType":"binary/octet-stream","ModTime":"2024-10-16T19:43:09.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T21:45:37.502299000+02:00","IsDir":true}
]

Without:

rclone lsjson -R s3:rnd/test/    
[
{"Path":"README.md","Name":"README.md","Size":18503,"MimeType":"binary/octet-stream","ModTime":"2024-10-16T19:43:09.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/some_other_file.txt","Name":"some_other_file.txt","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T13:38:51.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T21:43:46.539922000+02:00","IsDir":true}
]

PS: unfortunately, I have reached my post limit (too many posts for one day it seems), so it will be hard to answer you from now on...

So for the sake of it, I rechecked carefully the issue with azure:

First the recursive listing on the parent folder that shows that none of the files are missing (and the README.md file aside the folder is smaller to see which one we see in each case):

rclone lsjson -R azure:rnd/test/
[
{"Path":"README.md","Name":"README.md","Size":4520,"MimeType":"text/plain","ModTime":"2024-10-16T20:10:31.000000000Z","IsDir":false,"Tier":"Hot"},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"application/octet-stream","ModTime":"2024-06-01T12:48:31.549311060+02:00","IsDir":false,"Tier":"Hot"},
{"Path":"README.md/some_other_file.txt","Name":"some_other_file.txt","Size":5303,"MimeType":"text/plain","ModTime":"2024-10-16T14:10:29.000000000Z","IsDir":false,"Tier":"Hot"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T22:10:42.527047000+02:00","IsDir":true}
]

Then the recursive on the folder itself without trailing slash (we see the README.md aside, no issue):

rclone lsjson -R azure:rnd/test/README.md
[
{"Path":"README.md","Name":"README.md","Size":4520,"MimeType":"text/plain","ModTime":"2024-10-16T20:10:31.000000000Z","IsDir":false,"Tier":"Hot"}
]

with trailing slash (THIS ONE IS WRONG, it is expected to see what's inside the README.md folder with a bigger README.md and some_other_file.txt, we see the small README.md, the slash has no effect where it should):

rclone lsjson -R azure:rnd/test/README.md/
[
{"Path":"README.md","Name":"README.md","Size":4520,"MimeType":"text/plain","ModTime":"2024-10-16T20:10:31.000000000Z","IsDir":false,"Tier":"Hot"}
]

Same with s3, the parent:

rclone lsjson -R s3:rnd/test/             
[
{"Path":"README.md","Name":"README.md","Size":4520,"MimeType":"binary/octet-stream","ModTime":"2024-10-16T20:15:35.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md/some_other_file.txt","Name":"some_other_file.txt","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T13:38:51.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"README.md","Name":"README.md","Size":-1,"MimeType":"inode/directory","ModTime":"2024-10-16T22:15:38.903186000+02:00","IsDir":true}
]

the README.md without trailing slash (we see the small README.md file, the one aside):

rclone lsjson -R s3:rnd/test/README.md
[
{"Path":"README.md","Name":"README.md","Size":4520,"MimeType":"binary/octet-stream","ModTime":"2024-10-16T20:15:35.000000000Z","IsDir":false,"Tier":"STANDARD"}
]

With trailing slash, here contrarily to azure, we see as one may expect the content of the README.md folder (the bigger README.md and the other file some_other_file.txt):

rclone lsjson -R s3:rnd/test/README.md/
[
{"Path":"README.md","Name":"README.md","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T12:17:24.000000000Z","IsDir":false,"Tier":"STANDARD"},
{"Path":"some_other_file.txt","Name":"some_other_file.txt","Size":18503,"MimeType":"text/markdown","ModTime":"2024-10-16T13:38:51.000000000Z","IsDir":false,"Tier":"STANDARD"}
]

Many many many thanks to you all (and particularly to @asdffdsa) for your patience and support! It seems now a lot clearer what is working and the very small bit that does not with azure.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.