In my use of rclone mount, I often rescan one entire directory tree.
Given this access pattern, a find across the whole directory tree via the rclone mount takes about 1 minute 35 seconds. However, rclone --fast-list ls on the same directory takes about 5 seconds.
When a directory is listed on an rclone mount (e.g. ls), would it be an interesting idea for rclone to trigger a recursive directory listing to warm the directory cache? I know I'm going to be adding more directories and files into the structure. Curious if you've considered that and what you think of adding that as an option.
One option would be to just use a high directory cache time. That almost does the job, but when the data actually does change, then this eager listing option would still speed things up.
Huh - the 1 min 35 second find is on a Google Drive backend with a --poll-interval of 60s. (Should I retest to make sure my memory is right about that?) Seems to me find (on the rclone mount) and rclone --fast-list ls should be about the same speed if the cache is warm, and indeed the second recursive find is in the 1-2 second range.
It sounds like a good solution for me is to keep the cache warm directly with either vfs/refresh or find myself, so I'm happy with this answer, but wanted to express that confusion.
You are mixing a few things. Poll interval doesn't have anything to do with finds. Poll interval is when it will detect a change.
A change may invalidate a lot of cache or a little of the cache depending on where the change happened in the directory structure.
A fast-list is a recursive API operation.
A find on a mount that has the structure in memory (cached) is 0 API operations so will always be insanely faster.
felix@gemini:/GD$ time find . | wc -l
56327
real 0m0.653s
user 0m0.037s
sys 0m0.105s
and a refresh of the file system using fast-list
felix@gemini:~$ time /usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572
{
"result": {
"": "OK"
}
}
real 0m38.970s
user 0m0.011s
sys 0m0.018s
An actual fast-list ls
felix@gemini:~$ time rclone ls --fast-list GD: | wc -l
52410
real 0m35.334s
user 0m3.501s
sys 0m0.846s
If you were to do a find with nothing cached in memory, that creates a lot of API hits in comparison to the two operations I shared above as it was to walk through your directory structure.
If a person does a ls on a single directory, you don't want to go through the whole thing if you don't have to.