Rclone vfs/refresh + long dir cache time = fast lookups?

Is my assumption good?

Can I use rclone vfs/refresh to put my entire remote in the dir cache time defined by my settings to save api calls/faster lookup?

Today I was hitting the 100/sec api requests, and I noticed the errors were about listing directories. So I thought about this solution...

Also what happens if the remote changes ? Will it invalidate the cache automatically, or it will only see changes after the time set by --dir-cache-time ?

Yep.

Depends on the remote. A polling remote would get the changes via the poll interval like Google Drive, I set mine for 15 seconds if a change happens.

OK. So I think putting my entire remote in the cache will save a lot of api hits then.

What does exactly rclone cache? Just the file hierarchy ?

The docs doesn't exaclty says that by default rclone rc vfs/refresh is recursive.

Can I run just rclone rc vfs/refresh to put my entire remote in cache?

I don't think this command is doing anything:

rclone rc vfs/refresh --fast-list --rc-addr=:5574

It take less than 3 seconds to run this. No way it's listing my entire remote and putting into cache. localhost:5574 is the same

You need to add the recursive=true parameter too.

I use this in my systemd:

ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572 _async=true

So the mount gets refreshed on startup and it puts the job in the 'background' so it runs and does complete, but systemd does not wait for it.

I've seen some vfs/refresh behavior that I can't quite explain - if I perform an _async job like @Animosity022 suggests (rclone rc vfs/refresh recursive=true --fast-list --rc-addr 127.0.0.1:5572 _async=true) it gets kicked off and I get a job ID. If I look at the job ID, it doesn't look like it is doing anything and the cache is never refreshed - duration should be non-zero, I would think.

rclone rc --json '{ "jobid":6 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 0,
        "endTime": "0001-01-01T00:00:00Z",
        "error": "",
        "finished": false,
        "group": "job/6",
        "id": 6,
        "output": null,
        "startTime": "2020-11-11T08:23:45.720091424-05:00",
        "success": false
}

If I do a non-async vfs/refresh - (rclone rc vfs/refresh recursive=true --fast-list -v) - it completes after a few minutes and I see the changes on the mount.

 time rclone rc vfs/refresh recursive=true --fast-list -v
{
        "result": {
                "": "OK"
        }
}

real    1m46.648s
user    0m0.029s
sys     0m0.010s

John

You have to wait since the finished status is still false. Duration is only populated once it is finished.

Fair enough - though the async job took 18 minutes...whereas the non-async was just under 2. Does that seem right?

rclone rc --json '{ "jobid":6 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 1097.465712393,
        "endTime": "2020-11-11T08:42:03.185803824-05:00",
        "error": "",
        "finished": true,
        "group": "job/6",
        "id": 6,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-11T08:23:45.720091424-05:00",
        "success": true

They should take the same time really. Try it more than once maybe?

I've cron'ed the vfs/resfresh @ 15 minute intervals (testing for using DB for streaming mount) and it has been pretty consistent with the ~2 minute run time.


Wed Nov 11 18:30:01 EST 2020
{
        "result": {
                "": "OK"
        }
}
Wed Nov 11 18:31:36 EST 2020

But the _async job still dwaddles.

#rclone rc --json '{ "jobid":36 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 0,
        "endTime": "0001-01-01T00:00:00Z",
        "error": "",
        "finished": false,
        "group": "job/36",
        "id": 36,
        "output": null,
        "startTime": "2020-11-11T18:32:38.79651156-05:00",
        "success": false
}
# date
Wed 11 Nov 2020 06:34:49 PM EST

Mine are pretty much identical no matter how many times I run it:

felix@gemini:/etc/systemd/system$ time /usr/bin/rclone rc vfs/refresh recursive=true
{
        "result": {
                "": "OK"
        }
}

real    0m50.057s
user    0m0.011s
sys     0m0.025s
felix@gemini:/etc/systemd/system$ /usr/bin/rclone rc vfs/refresh recursive=true _async=true
{
        "jobid": 6
}
felix@gemini:/etc/systemd/system$ rclone rc --json '{ "jobid":6 }' job/status
{
        "duration": 47.336556332,
        "endTime": "2020-11-11T19:23:06.54014462-05:00",
        "error": "",
        "finished": true,
        "group": "job/6",
        "id": 6,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-11T19:22:19.203588304-05:00",
        "success": true
}

That is what I'd expect - it is running the same code and I don't think go routines are scheduled any differently.

@jad3675 can you time a few runs of the _async and non _async and post them?

What OS is the rclone mount running on?

Interesting. Ubuntu 20.04 behaves like this:

Non-Async:
Thu 12 Nov 2020 07:30:01 AM EST
{
        "result": {
                "": "OK"
        }
}
Thu 12 Nov 2020 07:31:44 AM EST

#rclone rc vfs/refresh recursive=true --fast-list -v --rc-addr 127.0.0.1:5572 _async=true
#rclone rc --json '{ "jobid":108 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 102.631065252,
        "endTime": "2020-11-12T08:07:38.489191969-05:00",
        "error": "",
        "finished": true,
        "group": "job/108",
        "id": 108,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-12T08:05:55.858126706-05:00",
        "success": true
}

Ubuntu 18.04 is like this:

Non-Async:
Thu Nov 12 08:00:01 EST 2020
{
        "result": {
                "": "OK"
        }
}
Thu Nov 12 08:01:48 EST 2020


#rclone rc vfs/refresh recursive=true --fast-list -v --rc-addr 127.0.0.1:5573 _async=true

#rclone rc --json '{ "jobid":89 }' job/status --rc-addr 127.0.0.1:5573
{
        "duration": 1604.384138229,
        "endTime": "2020-11-12T08:30:09.622358014-05:00",
        "error": "",
        "finished": true,
        "group": "job/89",
        "id": 89,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-12T08:03:25.238219791-05:00",
        "success": true
}

Both mounts have the almost exact same mount command. Max-cache-age on the ubuntu 18.04 machine is 36H, the 20.04 machine is 15H.
/usr/bin/rclone mount dcrypt: /mnt/drop --config /root/.config/rclone/rclone.conf --vfs-cache-mode full --vfs-cache-max-size 500G --vfs-cache-max-age 15h --dir-cache-time 48000h --cache-dir /home/rclone/vfs_cache --log-level INFO --log-file /tmp/rclone_drop.log --umask 002 --allow-other

Puzzling!

Can you try running with -vv --dump headers and look directly at the log - you'll see the http requests go by. In the delayed case do the http requests go by spaced throughout the period or are they all at the start or the end? I'm just wondering if the background task is blocked on something.

I changed the vfs cache time on the slow-working machine (Ubuntu 18.04 - it was 36H) to match the cache time on the working machine (ubuntu 20.04 15H) - once I restarted the mount, the _async job started working properly. Sorry I wasn't able to collect any detailed logs before I did that, though.

John

No worries! It can remain a mystery until next time :slight_smile:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.