Clearly Identify if a remote path is a dir or file?

patakijv · March 3, 2021, 7:28pm

I see from this post a few years ago named "how-to-quickly-know-if-a-remote-path-is-a-file-or-directory" (I can't post the link) that this question was asked and a response was provided that seems the output is for human review on a one by one basis.

I am curious if

since then there have been any newly available ways to handle this
and
there a way to make this determination more succinctly (i.e. perhaps a command that returns boolean response if it is a file or dir or a command that returns "dir" or "file") so a script that is reading a config file for paths to "rclone copy" can choose to do something different if it can determine that the remote path is a dir vs a file ahead of time?

patakijv · March 3, 2021, 8:40pm

Ok, I didn't catch in the previous thread that there was a return value of 1 or 0 in the proposed solution.

So I guess a bash function like this could be used to wrap that approach:

function get_type() {
  rclone rmdir -vv --dry-run $1 > /dev/null 2>&1
  [[ $? = 1 ]] && echo "file" || echo "dir"
}

So, my question if this above is handleable yet in the rclone commands yet or if this is the only way still.

This could probably used to be enhanced to first check if the path is valid / exists, so it doesn't just return "directory" when the path doesn't exist.

patakijv · March 4, 2021, 6:13am

Returning to update my previous answer with a bash function that checks if the path is a "file" or a "directory" using json, which also reports if the path is "not found" and if it is a "container/bucket dir" as well.

This requires you have jq installed (ex: apt-get install jq or brew install jq or ...)

function get_type_using_json() {
  path=$1
  base_name=$(basename ${path##*:})
  entry_json=$(rclone lsjson $path | NAME=$base_name jq -c '.[] | select(.Name=env.NAME)')
  file_exists=$([ "x$(jq -r '.Name' <<< "$entry_json")" = "x$base_name" ] && echo true || echo false)
  if $file_exists; then
    echo "file"
  else
    [[ $(dirname $path) = "." ]] && parent_dir="${path%%:*}:" || parent_dir=$(dirname $path)
    entry_json=$(rclone lsjson $parent_dir | NAME=$base_name jq -c '.[] | select(.Name==env.NAME)')
    is_dir=$([ "x$(jq '.IsDir' <<< "$entry_json")" = "xtrue" ] && echo true || echo false)
    if $is_dir; then
      is_container=$([ "x$(jq '.IsBucket' <<< "$entry_json")" = "xtrue" ] && echo true || echo false)
      if $is_container; then
        echo "container dir"
      else
        echo "dir"
      fi
    else
      echo "not found"
    fi
  fi
}

Above seems to work albeit with minimal testing. If anyone has any suggested improvements that would be great. One improvement could be if only one call to the remote was needed instead of 2 to make this determination.

It seems that all of this would be better handled in the compiled rclone code instead of needing to do this in the script calling rclone. Then it could be a command rclone makes available to the script.

ncw · March 5, 2021, 11:49am

I did get half way through making a --stat flag for lsjson (or maybe an rclone stat command would be better)

This would produce a single lsjson entry about the directory or the file.

...

What you could to is use this

$ rclone rc --loopback operations/list fs=/tmp/file remote=
2021/03/05 11:48:27 Failed to rc: loopback call failed: is a file not a directory

$ rclone rc --loopback operations/list fs=/tmp/dir remote=
{
	"list": []
}

I think that should be 100% reliable

patakijv · March 5, 2021, 8:09pm

Hey Nick,

Interesting... not sure what that rc command and options you list out is supposed to be doing really but I did just try it and I noticed that it returns the same result for an empty but existing dir as for a non-existent path and the same for a container/bucket dir. Basically all 3 cases return a json with a "list" key that is an empty array.

I like your idea of a rclone stat or a --stat flag. It would be great if when called on any path, it would report on whether it is a file, a dir, a container/bucket dir, or non-existent path.

Perhaps a rclone exists would be useful as well that could really just internally call rclone stat and determine if the path exists or not.

ncw · March 6, 2021, 11:25am

I don't think you said which file system you are using, but it sounds like s3 or swift - something bucket container based.

In that case the above won't work for non-existent directories. On bucket based systems, directories don't actually exist at all so you are quite at liberty to list a non-existing one.

I think we should make an issue about this. I've got some half finished code and I should probably attach it to the issue.

Can you please make a new feature request issue on github so we've got a record of it.

system · May 6, 2021, 7:25am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.