Idea: Eager caching of directory listing, in rclone mount

In my use of rclone mount, I often rescan one entire directory tree.

Given this access pattern, a find across the whole directory tree via the rclone mount takes about 1 minute 35 seconds. However, rclone --fast-list ls on the same directory takes about 5 seconds.

When a directory is listed on an rclone mount (e.g. ls), would it be an interesting idea for rclone to trigger a recursive directory listing to warm the directory cache? I know I'm going to be adding more directories and files into the structure. Curious if you've considered that and what you think of adding that as an option.

One option would be to just use a high directory cache time. That almost does the job, but when the data actually does change, then this eager listing option would still speed things up.

You can pre-warm the cache with vfs/refresh which will use the equivalent of --fast-list under the hood.

Which backend are you using? If Google drive then note that the --poll-interval will refresh the directory cache if it changed.

I might try that! Thanks.

Huh - the 1 min 35 second find is on a Google Drive backend with a --poll-interval of 60s. (Should I retest to make sure my memory is right about that?) Seems to me find (on the rclone mount) and rclone --fast-list ls should be about the same speed if the cache is warm, and indeed the second recursive find is in the 1-2 second range.

It sounds like a good solution for me is to keep the cache warm directly with either vfs/refresh or find myself, so I'm happy with this answer, but wanted to express that confusion.

Thanks for being so responsive!

You are mixing a few things. Poll interval doesn't have anything to do with finds. Poll interval is when it will detect a change.

A change may invalidate a lot of cache or a little of the cache depending on where the change happened in the directory structure.

A fast-list is a recursive API operation.
A find on a mount that has the structure in memory (cached) is 0 API operations so will always be insanely faster.

felix@gemini:/GD$ time find . | wc -l
56327

real    0m0.653s
user    0m0.037s
sys     0m0.105s

and a refresh of the file system using fast-list

felix@gemini:~$ time /usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572
{
        "result": {
                "": "OK"
        }
}

real    0m38.970s
user    0m0.011s
sys     0m0.018s

An actual fast-list ls

felix@gemini:~$ time rclone ls --fast-list GD: | wc -l
52410

real    0m35.334s
user    0m3.501s
sys     0m0.846s

If you were to do a find with nothing cached in memory, that creates a lot of API hits in comparison to the two operations I shared above as it was to walk through your directory structure.

If a person does a ls on a single directory, you don't want to go through the whole thing if you don't have to.