Mount VFS cache: cached files have no filesize until remount or the cache object expires

ens · January 12, 2023, 8:59pm

What is the problem you are having with rclone?

I am running various apps as docker compose services on an Ubuntu 22.04 machine. I'm running rclone on the host machine, not in a container. Sabnzbd is downloading into the rclone mount, which is using full VFS caching and a write-back of 1 hour, with the intention that Sonarr and Radarr are then able to instantly import files via hardlinking and deleting the old link. All this worked perfectly fine, but then I ran into an issue with Plex, where seeking would sometimes result in an "unknown error." One of the troubleshooting steps I did was checking whether rclone was up to date, and it turned out that the apt package was outdated. Something like version 1.51, can't remember exactly, while the latest version reported by rclone version --check was 1.61.1.

So I removed the apt package and installed via the install script from the docs. After this update I started running into the issues that this thread is about:

When sab has finished downloading something, and Sonarr/Radarr is about to import it, they both report "Unable to determine if file is a sample." At the same time, rclone logs that it was unable to get the file size of the file:
Jan 12 20:28:12 Uranium rclone[849]: ERROR : usenet/complete/tv/< FILENAME >.mkv(0xc002716900): Couldn't read size of file

If I ls -l the file from the host, it reports being 0 bytes. Same thing if I ls -l inside any of the containers, Sonarr, Radarr or Sabnzbd. If I ls -l the file inside the vfs cache directory, it has the correct filesize.

If I wait for the 1 hour to pass, the file has the correct size.
If I stop my docker containers and start them again, the file is still 0 bytes.
If I stop the containers, unmount and remount rclone, the file has the correct size.

I run each compose service as its own user, all of them with a shared group. They all run with umask 002, same as the rclone systemd service, which also runs as its own user and the shared group.

If I remove rclone and install the apt package again, the issue is resolved. I don't consider that a solution though.

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-57-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

Running rclone as a systemd service with the following ExecStart:

/usr/bin/rclone mount crypt:data /mnt/dukemedia/data/ --allow-other --vfs-cache-mode full --vfs-write-back 1h --vfs-cache-max-size 750G --umask 002 --uid 13003 --gid 13000 --dir-perms 0775 --file-perms 664 --log-systemd --log-level DEBUG

The rclone config contents with secrets removed.

[gdrive]
type = drive
client_id = < REDACTED >
client_secret = < REDACTED >
scope = drive
token = < REDACTED >
team_drive =

[crypt]
type = crypt
remote = gdrive:rclone
password = < REDACTED >
password2 = < REDACTED >

A log from the command with the `-vv` flag

This is a long log, but yeah... here you go.

https://paste.c-net.org/PenaltyGently

ens · January 12, 2023, 9:30pm

Tomorrow I will experiment with various versions to see if I can find a specific version where this issue was introduced.

Animosity022 · January 12, 2023, 9:57pm

After you updated versions, did you clear out the cache?

The version you were on before shouldn't matter much.

Can you do a simple test and just ls -al the file and share the output / debug log?

ens · January 12, 2023, 10:27pm

I did not.

The debug log I posted in the OP is pretty much that. There's a lot of other noise, but I don't know how I should avoid that. As you can see from the timestamps, the timespan is very short. I basically rebooted, started a download, waited for it to finish, and then saved the log.

Animosity022 · January 12, 2023, 10:28pm

Can you simplify?

Just start it a clean mount.

Copy a small file there.
Validate / share results.

ens · January 12, 2023, 10:46pm

Stopped all my containers and rclone systemd services.
Mounted from a clean root folder on the remote.
Logged in as the sabnzbd user on the host and created a new file named test on the mount.
Still logged in as the same user, I edited the test file on the mount.
Logged in as the Sonarr user and ls -al'd the file. Output:

-rw-rw-r-- 1 rclone dukemedia 483 Jan 12 22:40 /mnt/dukemedia/data/test

Debug log

The issue isn't appearing here, so it clearly has something to do with sabnzbd, or at least with the way sab interacts with the filesystem.

ens · January 13, 2023, 12:16am

I tested a range of rclone binaries, by doing the following with each of them:

down containers, stop systemd services
replace rclone binary
make sure I got the right version with rclone version
start systemd services, up containers
add an episode via Sonarr and wait for sab to finish the download and Sonarr to attempt the import

Here are the results:

1.61.1: bad
1.58.0: bad
1.55.0: bad
1.54.1: good
1.54.0: good
1.53.4: good
1.53.3: good

So it appears this issue was introduced in version 1.55.0, which I can see on the changelog had a pretty significant amount of VFS changes. This might be more of an "incompatibility" than a bug, I guess? But regardless, up to and including version 1.54.1, the issue did not exist.

Animosity022 · January 13, 2023, 2:03am

Around 1.55 the vfs cache modes were completely rewritten so full in the previous vs doesn't compare to full in the newer versions.

In the previous log, I was trying to look through it but I never see a file copy as something looks to already be there.

Without any other noise, can you start clean ( no cache area), start the mount, have sab copy / download one file and once the download is done, ls -al the file, share the debug log without anything else in it?

I use dockers for everything minus rclone .

That doesn't make sense to me as rclone doesn't support hard links so it will not work.

If you tried that on a rclone mount, you get an error.

ln: failed to create hard link 'test2.mkv' => 'test1.mkv': Function not implemented

Are you doing something to the cache area directly rather than on the mount?

ens · January 13, 2023, 9:51am

Re: hardlinking, maybe it's me who has misunderstood how things work, but I thought that was the only way instant file moves were possible. And instant files moves are happening, when it works. The only other kind of move operation I know of is copy + delete which would take a lot longer.

I am not touching the cache directly, I'm doing everything through the mount.

I'm gonna set up a clean environment to test as you described as soon as I can and I'll report back.

ens · January 13, 2023, 11:50am

You are right, hardlinking is not possible. I suppose they just use mv.

Now, the plot thickens... I stopped all my containers and systemd services, cleaned out the VFS cache by manually deleting everything under /home/rclone/.cache/rclone/vfs/. I then started the systemd services for the rclone mounts and then only started my caddy and sabnzbd containers. There will still be quite a bit of "noise" in the log, because it seems like sab will list the contents of the complete directory every second. Anyway, I then manually added an nzb and let it download. And... there was absolutely no issue. Rclone didn't report any errors, I was able to see the correct size of the file that sab had moved to the completed directory with ls. Here's the log.

ens · January 13, 2023, 12:25pm

This is extremely odd, but it seems like it is something that Sonarr is doing, and not sabnzbd. If I first start the mounts and only up the caddy and sabnzbd containers, and then add an nzb to sab and let it download, I can afterwards see the size of the file:
-rw-rw-r-- 1 rclone dukemedia 1418619716 Jan 13 12:20 The.Mandalorian.S02E02.NORDiC.1080p.WEB-DL.H.265.DDP5.1.Atmos-TWASERiES.mkv

If I then also up the Sonarr container, it sees the completed download in the sab queue, attempts to import it and gives me the good old "Unable to determine if file is a sample."

If I then check the size of the file...
-rw-rw-r-- 1 rclone dukemedia 0 Jan 13 12:23 The.Mandalorian.S02E02.NORDiC.1080p.WEB-DL.H.265.DDP5.1.Atmos-TWASERiES.mkv

It's suddenly 0 bytes. I'm absolutely stumped.

Animosity022 · January 13, 2023, 1:24pm

Closer.

So let's get a debug log of when Sonarr moves something.

I have sonarr/radarr move from a non rclone area to my rclone mount all the time as that's my workflow.

ens · January 13, 2023, 1:26pm

I'll get that.

Does that not take a long time? I have nothing inherently against doing it like this, but it seemed like it was forced to do a copy + delete operation when importing files, which is obviously not as fast as instant.

Animosity022 · January 13, 2023, 1:32pm

I'd ask the question slightly different.

Does it matter if moving a file is instant or takes say 15 seconds or even 1 minute if it's automated and hands free?

I used to have mergerfs / use hard linking and run an upload script every night. Lots of moving parts. I saved disk space since I hardlinked and things uploaded each night.

I removed mergerfs, removed my script and it copies to the mount and auto uploads in an hour. Less parts. Less chance to break.

I do have an hour of time when I am using "double storage" compared to before, but I'm fine with that. My goal tends to be less hands on, less parts as more parts == more things that can break.

If something takes a minute to import at 2:03am, who cares?

So for me, it depends on what problem I'm trying to solve and what the solution I'll pick based on my requirements. If speed was the top requirement, you are 100% right. Use mergerfs/use hardlinks.

ens · January 13, 2023, 2:11pm

Speed is a priority for me. I have family members and close friends as users that can add content via Overseerr, and since I know it's possible for it to instantly move files with an older version of rclone with this setup, which is very close to yours, I want to aim for that. In my experience, moving files into the mount from the host can takes several minutes, up to 20 minutes for 4K remuxes. I'm tempted to just go back to an rclone version before 1.55.0, but I don't like the idea of it.

Anyway, I spun up sonarr, sabnzbd and caddy, switched to debug logging in sonarr, and searched for a single episode. I then let it run from search, to download, to import (which failed) this is the log.

It seems it's throwing a bunch of Mono exceptions, I'm thinking maybe I should try the Sonarr V4 beta since that uses .NET Core if I recall correctly?

Animosity022 · January 13, 2023, 2:24pm

I use a 1 TB SSD for my cache disk so never takes more than a minute or two.

Can you snag the rclone log for the error as that would be the helpful part?

ens · January 13, 2023, 2:48pm

I tried to limit how much of the log I saved with journalctl --since "1 minute ago", just after Sonarr started its failed attempts at importing. Here's the log.

Animosity022 · January 13, 2023, 3:23pm

That looks like after the error happened as the first line is size=0.

Jan 13 14:43:52 Uranium rclone[850]: DEBUG : usenet/complete/tv/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxrwxr-x, err=<nil>
Jan 13 14:43:52 Uranium rclone[850]: DEBUG : usenet/complete/tv/: Lookup: name="The.Mandalorian.S02.E01.The.Marshal.1080p.WEBRiP.x264.DTS-HD.MA-CREATiVE24.mkv-xpost.2"
Jan 13 14:43:52 Uranium rclone[850]: DEBUG : usenet/complete/tv/: >Lookup: node=usenet/complete/tv/The.Mandalorian.S02.E01.The.Marshal.1080p.WEBRiP.x264.DTS-HD.MA-CREATiVE24.mkv-xpost.2/, err=<nil>
Jan 13 14:43:52 Uranium rclone[850]: DEBUG : usenet/complete/tv/The.Mandalorian.S02.E01.The.Marshal.1080p.WEBRiP.x264.DTS-HD.MA-CREATiVE24.mkv-xpost.2/: Attr:
Jan 13 14:43:52 Uranium rclone[850]: DEBUG : usenet/complete/tv/The.Mandalorian.S02.E01.The.Marshal.1080p.WEBRiP.x264.DTS-HD.MA-CREATiVE24.mkv-xpost.2/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxrwxr-x, err=<nil>

ens · January 13, 2023, 3:34pm

I see, I'll do the whole thing again and get more of the log. I'll be out for a while though, so it won't be until later tonight or tomorrow. I'll be back.

Thanks a lot for your help by the way, I appreciate it!

ens · January 14, 2023, 10:21pm

To avoid me cutting anything off since I don't know what I'm looking for, I've created as clean an environment as I could and then gotten the log only for this boot:

Stop containers, stop mount
Clear out rclone cache vfs and vfsMeta directories
Make sure to clear old log (/home/ens/rclone.log)
Reboot
Start mount
Start containers: caddy, sabnzbd, sonarr
Added a single episode to sonarr
Wait for sabnzbd to finish downloading
Wait for sonarr to refresh status from sabnzbd and attempt importing file
As soon as I see the "unable to determine if file is sample" error in sonarr, stopped containers
Get systemd log for current boot: journalctl -b -u dukemedia-data-mount.service >> /home/ens/rclone.log

This is the resulting log. I hope it's more useful.