Rclone vfs/refresh + long dir cache time = fast lookups?

random404 · November 11, 2020, 2:01am

Is my assumption good?

Can I use rclone vfs/refresh to put my entire remote in the dir cache time defined by my settings to save api calls/faster lookup?

Today I was hitting the 100/sec api requests, and I noticed the errors were about listing directories. So I thought about this solution...

Also what happens if the remote changes ? Will it invalidate the cache automatically, or it will only see changes after the time set by --dir-cache-time ?

Animosity022 · November 11, 2020, 3:22am

Yep.

Depends on the remote. A polling remote would get the changes via the poll interval like Google Drive, I set mine for 15 seconds if a change happens.

random404 · November 11, 2020, 5:44am

OK. So I think putting my entire remote in the cache will save a lot of api hits then.

What does exactly rclone cache? Just the file hierarchy ?

The docs doesn't exaclty says that by default rclone rc vfs/refresh is recursive.

Can I run just rclone rc vfs/refresh to put my entire remote in cache?

random404 · November 11, 2020, 8:55am

I don't think this command is doing anything:

rclone rc vfs/refresh --fast-list --rc-addr=:5574

It take less than 3 seconds to run this. No way it's listing my entire remote and putting into cache. localhost:5574 is the same

darthShadow · November 11, 2020, 10:13am

You need to add the recursive=true parameter too.

Animosity022 · November 11, 2020, 11:15am

I use this in my systemd:

ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572 _async=true

So the mount gets refreshed on startup and it puts the job in the 'background' so it runs and does complete, but systemd does not wait for it.

jad3675 · November 11, 2020, 1:40pm

I've seen some vfs/refresh behavior that I can't quite explain - if I perform an _async job like @Animosity022 suggests (rclone rc vfs/refresh recursive=true --fast-list --rc-addr 127.0.0.1:5572 _async=true) it gets kicked off and I get a job ID. If I look at the job ID, it doesn't look like it is doing anything and the cache is never refreshed - duration should be non-zero, I would think.

rclone rc --json '{ "jobid":6 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 0,
        "endTime": "0001-01-01T00:00:00Z",
        "error": "",
        "finished": false,
        "group": "job/6",
        "id": 6,
        "output": null,
        "startTime": "2020-11-11T08:23:45.720091424-05:00",
        "success": false
}

If I do a non-async vfs/refresh - (rclone rc vfs/refresh recursive=true --fast-list -v) - it completes after a few minutes and I see the changes on the mount.

 time rclone rc vfs/refresh recursive=true --fast-list -v
{
        "result": {
                "": "OK"
        }
}

real    1m46.648s
user    0m0.029s
sys     0m0.010s

John

darthShadow · November 11, 2020, 2:10pm

You have to wait since the finished status is still false. Duration is only populated once it is finished.

jad3675 · November 11, 2020, 3:05pm

Fair enough - though the async job took 18 minutes...whereas the non-async was just under 2. Does that seem right?

rclone rc --json '{ "jobid":6 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 1097.465712393,
        "endTime": "2020-11-11T08:42:03.185803824-05:00",
        "error": "",
        "finished": true,
        "group": "job/6",
        "id": 6,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-11T08:23:45.720091424-05:00",
        "success": true

ncw · November 11, 2020, 5:47pm

They should take the same time really. Try it more than once maybe?

jad3675 · November 11, 2020, 11:38pm

I've cron'ed the vfs/resfresh @ 15 minute intervals (testing for using DB for streaming mount) and it has been pretty consistent with the ~2 minute run time.


Wed Nov 11 18:30:01 EST 2020
{
        "result": {
                "": "OK"
        }
}
Wed Nov 11 18:31:36 EST 2020

But the _async job still dwaddles.

#rclone rc --json '{ "jobid":36 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 0,
        "endTime": "0001-01-01T00:00:00Z",
        "error": "",
        "finished": false,
        "group": "job/36",
        "id": 36,
        "output": null,
        "startTime": "2020-11-11T18:32:38.79651156-05:00",
        "success": false
}
# date
Wed 11 Nov 2020 06:34:49 PM EST

Animosity022 · November 12, 2020, 12:25am

Mine are pretty much identical no matter how many times I run it:

felix@gemini:/etc/systemd/system$ time /usr/bin/rclone rc vfs/refresh recursive=true
{
        "result": {
                "": "OK"
        }
}

real    0m50.057s
user    0m0.011s
sys     0m0.025s
felix@gemini:/etc/systemd/system$ /usr/bin/rclone rc vfs/refresh recursive=true _async=true
{
        "jobid": 6
}
felix@gemini:/etc/systemd/system$ rclone rc --json '{ "jobid":6 }' job/status
{
        "duration": 47.336556332,
        "endTime": "2020-11-11T19:23:06.54014462-05:00",
        "error": "",
        "finished": true,
        "group": "job/6",
        "id": 6,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-11T19:22:19.203588304-05:00",
        "success": true
}

ncw · November 12, 2020, 9:46am

That is what I'd expect - it is running the same code and I don't think go routines are scheduled any differently.

@jad3675 can you time a few runs of the _async and non _async and post them?

What OS is the rclone mount running on?

jad3675 · November 12, 2020, 1:45pm

Interesting. Ubuntu 20.04 behaves like this:

Non-Async:
Thu 12 Nov 2020 07:30:01 AM EST
{
        "result": {
                "": "OK"
        }
}
Thu 12 Nov 2020 07:31:44 AM EST

#rclone rc vfs/refresh recursive=true --fast-list -v --rc-addr 127.0.0.1:5572 _async=true
#rclone rc --json '{ "jobid":108 }' job/status --rc-addr 127.0.0.1:5572
{
        "duration": 102.631065252,
        "endTime": "2020-11-12T08:07:38.489191969-05:00",
        "error": "",
        "finished": true,
        "group": "job/108",
        "id": 108,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-12T08:05:55.858126706-05:00",
        "success": true
}

Ubuntu 18.04 is like this:

Non-Async:
Thu Nov 12 08:00:01 EST 2020
{
        "result": {
                "": "OK"
        }
}
Thu Nov 12 08:01:48 EST 2020


#rclone rc vfs/refresh recursive=true --fast-list -v --rc-addr 127.0.0.1:5573 _async=true

#rclone rc --json '{ "jobid":89 }' job/status --rc-addr 127.0.0.1:5573
{
        "duration": 1604.384138229,
        "endTime": "2020-11-12T08:30:09.622358014-05:00",
        "error": "",
        "finished": true,
        "group": "job/89",
        "id": 89,
        "output": {
                "result": {
                        "": "OK"
                }
        },
        "startTime": "2020-11-12T08:03:25.238219791-05:00",
        "success": true
}

Both mounts have the almost exact same mount command. Max-cache-age on the ubuntu 18.04 machine is 36H, the 20.04 machine is 15H.
/usr/bin/rclone mount dcrypt: /mnt/drop --config /root/.config/rclone/rclone.conf --vfs-cache-mode full --vfs-cache-max-size 500G --vfs-cache-max-age 15h --dir-cache-time 48000h --cache-dir /home/rclone/vfs_cache --log-level INFO --log-file /tmp/rclone_drop.log --umask 002 --allow-other

ncw · November 12, 2020, 3:35pm

Puzzling!

Can you try running with -vv --dump headers and look directly at the log - you'll see the http requests go by. In the delayed case do the http requests go by spaced throughout the period or are they all at the start or the end? I'm just wondering if the background task is blocked on something.

jad3675 · November 12, 2020, 8:27pm

I changed the vfs cache time on the slow-working machine (Ubuntu 18.04 - it was 36H) to match the cache time on the working machine (ubuntu 20.04 15H) - once I restarted the mount, the _async job started working properly. Sorry I wasn't able to collect any detailed logs before I did that, though.

John

ncw · November 13, 2020, 10:19am

No worries! It can remain a mystery until next time

system · January 13, 2021, 6:19am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.