Cache slow dirlistings / error refreshing object

Linhead · June 22, 2018, 3:05pm

Hi,

I got some odd behavior on a cache mount after a couple of hours (v1.42 but all previous versions of rclone with cache before that).

NOTE however, I do 'rclone sync' from another remote to this one on another machine.

Symptom: I get very slow directory listings with files that has since been moved away in them. I also have "error refreshing object in : in cache fs Google drive root 'data': object not found" in my debug logs.

Rclone config:

[gdrive2]
type = drive
client_id = something
client_secret = a_secret
token = myaccesstoken

[gdrive2-data-cache]
type = cache
remote = gdrive2:data
plex_url = http://127.0.0.1:32400
plex_username = aplexusername
plex_password = aplexpassword
chunk_size = 8M
info_age = 1h
chunk_total_size = 10G
plex_token = myplextoken

[gdrive2-data-cache-crypt]
type = crypt
remote = gdrive2-data-cache:
filename_encryption = standard
password = thepassword
password2 = thepassword2

Whenever my debug logs show something like this the symptom seems to start:

2018/06/22 15:38:22 ERROR : gb4o50vf6govs295me5r07o2mg/tqgevpp96gf890uik5kp610iqp/9mff12bfjeh1217shmkkavbl90/rnafd632cpekrq1u3q2nudtcgs7eaqa0b64si35ndcf6ud2n0sv596ekbl7duv4hki4f24c70jdh0: error refreshing object in : in cache fs Google drive root 'data': object not found

Getting the real filename,

myuser@mybox:~$ rclone cryptdecode gdrive2-data: gb4o50vf6govs295me5r07o2mg/tqgevpp96gf890uik5kp610iqp/9mff12bfjeh1217shmkkavbl90/rnafd632cpekrq1u3q2nudtcgs7eaqa0b64si35ndcf6ud2n0sv596ekbl7duv4hki4f24c70jdh0
gb4o50vf6govs295me5r07o2mg/tqgevpp96gf890uik5kp610iqp/9mff12bfjeh1217shmkkavbl90/rnafd632cpekrq1u3q2nudtcgs7eaqa0b64si35ndcf6ud2n0sv596ekbl7duv4hki4f24c70jdh0         download/tv/My.Show/My.Show.-.s03e24.SDTV.mkv

My rclone is run like this:

myuser@mybox:~/stuff$ rclone --config /home/myuser/.rclone.gdrive2.conf mount --read-only --allow-other --dir-cache-time=1m --cache-info-age=8h --cache-chunk-size=8M --cache-total-chunk-size=200G --cache-chunk-clean-interval=5m --cache-read-retries=5 --cache-db-wait-time=1s --cache-tmp-wait-time=10m --cache-workers=10 --buffer-size=16M --attr-timeout=1s --cache-writes --cache-tmp-upload-path=/home/myuser/.cache/rclone-gdrive2-cache-upload --cache-db-path=/home/myuser/.cache/rclone-gdrive2-cache-db --cache-chunk-path=/home/myuser/.cache/rclone-gdrive2-cache-chunks --rc --rc-addr=:5572 --log-level DEBUG --log-file=/var/log/rclone-gdrive2.log --cache-db-purge gdrive2-data-cache-crypt: /mnt/rclone-gdrive2

Show the file:

myuser@mybox:~/stuff$ time ls -al
/mnt/netdata/download/tv/My.Show/My.Show*.s03e24.SDTV.mkv
-rw-r--r-- 1 myuser myuser 0 Jun 22 05:14
"/mnt/netdata/download/tv/My.Show/My.Show.-.s03e24.SDTV.mkv"

real    0m0.470s
user    0m0.001s
sys     0m0.001s

Compare to what is online and notice the *s03e24.SDTV.mkv is not present even more than 1 minute later (dir_cache = 1min):

myuser@mybox:~/stuff$ rclone ls "gdrive2-data:download/tv/My.Show/" | grep s03e24
	38194 My.Show.-.s03e24.WEBDL-1080p-thumb.jpg
	38194 My.Show.-.s03e24.WEBDL-1080p.metathumb
4005761211 My.Show.-.s03e24.WEBDL-1080p.mkv
	 1319 My.Show.-.s03e24.WEBDL-1080p.nfo
	 2242 My.Show.-.s03e24.WEBDL-1080p.xml

5mins later:

myuser@mybox:~/stuff$ time ls -al /mnt/netdata/download/tv/My.Show | grep s03e24
-rw-r--r-- 1 myuser myuser          0 Jun 22 05:14 My.Show.-.s03e24.SDTV.metathumb
-rw-r--r-- 1 myuser myuser          0 Jun 22 05:14 My.Show.-.s03e24.SDTV.mkv
-rw-r--r-- 1 myuser myuser          0 Jun 22 05:14 My.Show.-.s03e24.SDTV.nfo
-rw-r--r-- 1 myuser myuser          0 Jun 22 05:14 My.Show.-.s03e24.SDTV-thumb.jpg
-rw-r--r-- 1 myuser myuser          0 Jun 22 05:14 My.Show.-.s03e24.SDTV.xml
-rw-r--r-- 1 myuser myuser      38194 Jun 22 00:00 My.Show.-.s03e24.WEBDL-1080p.metathumb
-rw-r--r-- 1 myuser myuser 4005761211 May 26 13:45 My.Show.-.s03e24.WEBDL-1080p.mkv
-rw-r--r-- 1 myuser myuser       1319 Jun 21 23:59 My.Show.-.s03e24.WEBDL-1080p.nfo
-rw-r--r-- 1 myuser myuser      38194 Jun 21 23:59 My.Show.-.s03e24.WEBDL-1080p-thumb.jpg
-rw-r--r-- 1 myuser myuser       2242 Jun 21 23:59 My.Show.-.s03e24.WEBDL-1080p.xml

real    2m51.659s
user    0m0.014s
sys     0m0.016s

5 mins later:

myuser@mybox:~/stuff$ time ls -al /mnt/netdata/download/tv/My.Show | grep s03e24 > /dev/null
date

real    2m29.558s
user    0m0.024s
sys     0m0.015s

This continues until I remount (where the --cache-db-purge takes effect) or SIGHUP or rc/expire the directory.

Anybody else experiencing tihs?
Am I doing something super wrong here?

Thanks

Animosity022 · June 22, 2018, 3:50pm

Ideally, you want your dir-cache a bit higher than 1 minute as it defeats the purpose of the cache if it’s so low. Any changes are supposed to be picked up via polling so I’d set it pretty high, just lower than the cache-info-age.

I run mine at 96 hours.

I’d turn off cache-writes as that’s already there with the cache-tmp-uploads.

How are you syncing between machines? I’m not quite following how files are getting from point a to point b.

Linhead · June 22, 2018, 6:06pm

Thanks @Animosity022,

Good point - possibly there is conflict of the gdrive-activity process and the 1m timeout - I've changed to 24h(dir)/96h(chunks) and restarted.

Regarding files from point a to b, this is simply (on another machine) a:

$ rclone sync gdrive-data: gdrive2-data:

ie. totally separate accounts

Thanks!

Animosity022 · June 22, 2018, 6:20pm

I’m guessing the 2 accounts there are separate as well since you can only have 1 cache working on an entry?

As the sync is going, are you seeing the updates in the logs as well? From my own testing, I’d get mixed results if I moved a lot of stuff around in the cache so I just use the --vfs-read-chunk-size along with --vfs-cache-mode writes --vfs-cache-max-age 6h and it just uploads after 6 hours with a longer dir-cache.

Linhead · June 22, 2018, 9:40pm

Thanks @Animosity022

Yep - the 2 accounts are separate where the sync is done from - there is no cache configured on the box I’m syncing from A->B at all.

Where I do write files to gdrive A (also altogether elsewhere) it is cached with a longer max-age, and never really had trouble ensuring data gets to gdrive B over time (gdrive B is a backup which is for readonly use only anyway).

I did a quick test and yes the updates appear in the log on my gdrive B that’s cache-mounted as per above quite soon after doing a sync from gdrive A->B elsewhere so that’s good.

It’s been a couple of hours and I’ve not seen similar errors in my cache mounted logs and ‘ls’ across the directory structure is still going quickly, I’ll keep a eye on it to see if the dir cache time from 1m to 24h is merely delaying the problem or solving the issue for me (possible race condition if that is the case?)

Appreciate the help!

Linhead · June 26, 2018, 9:51am

@Animosity022 I'm afraid adjusting the timings just delayed the time between occurrences.

Today had loads of the same 'error refreshing object in : in cache fs Google'.... messages in the debug logs, taking one example here,

2018/06/26 08:44:52 ERROR : tb2l50vf6gsvs7253e5r07v2tg/tqxewpp24gf190ubk8kz210kig/c938ogmgm1ejavh38v2cyd3pjunnife2vlbbmr72j81i72efj5d0/vbs25l8c15kj5l50rs2r6b2n2haufuge1apls2maje7o2tmak7vrfg32o1b52rwdgpeilv8965pb2: error refreshing object in : in cache fs Google drive root 'data': object not found

Decoding it:

$ rclone cryptdecode gdrive2-data: tb2l50vf6gsvs7253e5r07v2tg/tqxewpp24gf190ubk8kz210kig/c938ogmgm1ejavh38v2cyd3pjunnife2vlbbmr72j81i72efj5d0/vbs25l8c15kj5l50rs2r6b2n2haufuge1apls2maje7o2tmak7vrfg32o1b52rwdgpeilv8965pb2
tb2l50vf6gsvs7253e5r07v2tg/tqxewpp24gf190ubk8kz210kig/c938ogmgm1ejavh38v2cyd3pjunnife2vlbbmr72j81i72efj5d0/vbs25l8c15kj5l50rs2r6b2n2haufuge1apls2maje7o2tmak7vrfg32o1b52rwdgpeilv8965pb2 download/tv/My.Show/My.Show.-.s10e13.SDTV.metathumb

Comparing with what's actually on the drive:

$ rclone ls gdrive2-data:download/tv/My.Show/ | grep -c My.Show.-.s10e13.SDTV.metathumb
0

It has occurred a few times when traversing the directory in a couple of hours:

$ grep tb2l50vf6gsvs7253e5r07v2tg/tqxewpp24gf190ubk8kz210kig/c938ogmgm1ejavh38v2cyd3pjunnife2vlbbmr72j81i72efj5d0/vbs25l8c15kj5l50rs2r6b2n2haufuge1apls2maje7o2tmak7vrfg32o1b52rwdgpeilv8965pb2 /tmp/rclone-gdrive2.log | grep -i ' error ' -c
13

Now - I know that Sonarr has 'upgraded' this episode on another system (which moves it to a different directory instead of deleting it), and the 'rclone ls' indicates the current state, but it seems like the cache is not aware of this fact and thus is trying to refresh a object which does not exist?

Again - this issue is 'fixed' by doing a -HUP / rc/expire or simply remounting (since i cache-db-purge).

Thanks!

Animosity022 · June 26, 2018, 1:29pm

The dir-cache-time should be lower than the --cache-info-age.

Can you post your mount commands you are using? If you set the one to 24 hours, and the cache-age is still 8h, it would be out of sync.

Animosity022 · June 26, 2018, 1:32pm

Also sounds like you might be hitting:

Linhead · June 26, 2018, 1:50pm

@Animosity022 thanks for the reply - sure thing I mount with:

$ rclone --config /home/username/.rclone.gdrive2.conf mount --read-only --allow-other --dir-cache-time=24h --cache-info-age=96h --cache-chunk-size=8M --cache-total-chunk-size=200G --cache-chunk-clean-interval=5m --cache-read-retries=5 --cache-db-wait-time=1s --cache-workers=10 --buffer-size=16M --attr-timeout=1s --cache-db-path=/home/username/.cache/rclone-gdrive2-cache-db --cache-chunk-path=/home/username/.cache/rclone-gdrive2-cache-chunks --rc --rc-addr=:5572 --log-level DEBUG --log-file=/tmp/rclone-gdrive2.log --cache-db-purge gdrive2-data-cache-crypt: /mnt/rclone-gdrive2

Thanks for the bug item - whilst trying to find a reproducible-at-will method for this with a 1min (dir) 2min (chunk) cache, I ran into what is described in #2354 indeed. The only difference being that my 'ls' listings of a directory with a missing object can at times stall for 2-3mins but that might just be things being compounded over time.

For now it seems like I should purge my DB whenever I start getting 'object not found' in my logs.

Soulplayer · July 23, 2018, 5:12pm

I've been having the same problem now. Any fix or ideas?

Linhead · July 24, 2018, 12:35pm

@Soulplayer

As mentioned above, I think it is related to bug #2354, I’ve switched over to using VFS in the meantime instead and have had no problems.

Soulplayer · July 24, 2018, 1:21pm

@Linhead
Could you explain to my what I need to do to migrate from cache to VFS? I don’t want to make any mistakes while doing…

Also I run sonarr/radarr. That’s why I like the rclone cache. They put the files in the upload path and after 1 hour they migrate to gdrive.

Thanks in advance!

Linhead · July 24, 2018, 1:26pm

@Soulplayer - VFS does not do delayed-uploads I’m afraid.

FWIW - some good examples for VFS here this thread

Soulplayer · July 24, 2018, 1:42pm

@Linhead
Sorry to bother you again. I changed the mount so that it mounts not the cache but the google drive -> crypt directly. I hope that’s ok?

I also edited the rclone.service file:

[Unit]
Description=RClone Service
AssertPathIsDirectory=/home/user
After=plexdrive.target network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/bin/rclone mount gmedia: /mnt/media
–allow-other
–dir-cache-time 48h
–vfs-read-chunk-size 64M
–vfs-read-chunk-size-limit 2G
–buffer-size 128M
–umask 002
–log-file /home/user/logs/rclone.log
–log-level INFO
ExecStop=/usr/bin/sudo /usr/bin/fusermount -uz /mnt/media
Restart=on-abort
User=user
Group=user

[Install]
WantedBy=default.target

Is it normal that the log stays empty?

Linhead · July 24, 2018, 2:51pm

Not in a position to try the systemd-type method myself - I would just make sure it works as you expect from the commandline first.

My log level is set to DEBUG and I have output and have INFO lines so yes I would expect you to see something in the logs - as mentioned, best first-step is to get it working from the commandline first in case it is something obviously wrong (such as /home/user/logs not existing).

Animosity022 · July 24, 2018, 2:56pm

What OS are you running? I’ve noticed since I’ve flipped over to debian, my logs seem broken. I’m sure the systemd is right and the log file exists but I haven’t spent much time debugging yet.

Linhead · July 24, 2018, 3:00pm

It could be Secure Tmp handling in Systemd trying to be useful and "chroot"ing the files somewhere in /tmp/private* at a guess - but IMHO, have a go at the CLI first.

Animosity022 · July 24, 2018, 3:03pm

I’ve used the same systemd file now for Ubuntu, Fedora, CentOS and Debian. Debian is the only one that I’ve come across that can’t make a log file. If I turn the log level up to DEBUG, it creates it. If I have it at INFO or NOTICE, it does not.