That pesky "failed to reload" error message

What is the problem you are having with rclone?

I used to have 2 dirs: Local (dir) and Remote (rclone B2 mount). They were merged with mergerfs in Merged dir. I would sometimes upload from Local to B2. This was the old @Animosity022's approach.

Yesterday I migrated to new @Animosity022's approach, without mergerfs, using VFS Cache.

I unmounted Remote and Merged. Removed the old systemd units for mounting them. Setup a new mount in Merged with full cache settings (based on @Animosity022 repo, with some small tweaks for B2 and bwlimits). The dir appeared correctly with my remote files.

Then I rsynced all my Local files (with -a --inplace) to my Merged dir. The rclone log looked good: Rclone got the files and queued them for upload after the 24h wait time. The rclone cache dir has my entire content from Local (1.8TB). All good.

Note: I didn't touch the cache dir, just looked at it using ncdu.

Then I had to unmount Merged dir (files from cache have not been uploaded yet). Then I mounted it back, and this is where the trouble began.

Trouble

It's taking hours to mount again. Systemd status is showing "activating (start)" and the log is showing hundreds of the following record pairs:

Dec 05 10:03:59 htpc rclone[20243]: INFO  : Files/Name/pic1.jpg: vfs cache: queuing for upload in 24h0m0s
Dec 05 10:03:59 htpc rclone[20243]: ERROR : Files/Name/pic1.jpg: vfs cache: failed to reload item: reload: failed to add virtual dir entry: file does not exist

Hundreds. I think it's showing error for every single file. When I look in the cache dir, all these files are there.

Question: Why is it telling me that they are failing to reload? Is something going horribly wrong?

Run the command 'rclone version' and share the full output of the command.

> rclone version
rclone v1.60.1
- os/version: ubuntu 18.04 (64 bit)
- os/kernel: 5.4.0-135-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.3
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

B2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

This service has everything. Rclone config only has creds.

[Unit]
Description=B2 Mount
AssertPathIsDirectory=/home/htpc/Merged

[Service]
Type=notify
ExecStart=/usr/bin/rclone mount B2:[Bucket]/HTPC /home/htpc/Merged \
  --allow-other \
  --b2-chunk-size 50M \
  --b2-hard-delete \
  --bwlimit "08:15,1M:off 23:59,off" \
  --cache-dir=/home/htpc/RCloneCache \
  --config /home/htpc/.config/rclone/rclone.conf \
  --dir-cache-time 1h \
  --disable-http2 \
  --fast-list \
  --log-level INFO \
  --transfers 32 \
  --poll-interval 0 \
  --vfs-cache-max-age 8760h \
  --vfs-cache-max-size 1400G \
  --vfs-cache-mode full \
  --vfs-write-back 24h \
  --vfs-read-chunk-size-limit 500M \
  --rc \
  --rc-no-auth
ExecStop=/bin/fusermount -uz /home/htpc/Merged
ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572 _async=true
Restart=on-failure
RestartSec=10
User=htpc
Group=htpc

[Install]
WantedBy=multi-user.target

The rclone config contents with secrets removed.

[B2]
type = b2
account = [id]
key = [key]
endpoint =

A log from the command with the -vv flag

The log messages above are running with INFO level. I'm afraid to stop rclone during this "activating" stage and restart it with DEBUG level. Should I stop it anyway?

I think that means the cache directory got something removed.

Did you delete/remove anything from the cache-dir?

I've seen the other threads about this, but I swear, I checked individual files that are mentioned in errors, and they are absolutely present in the cache dir. I haven't deleted a single thing.

I can pair and let you inspect the cache dir yourself.

I just decided to check permissions on a file which absolutely exists inside the cache dir, but shows up as "failed to reload" in the log, and I see the following:

-rw------- htpc htpc

Is this the issue?

Depends on the user you are running as.

The cache / file permissions are user only.

drwx------    2 felix felix   4096 Dec  5 08:23
-rw-------    1 felix felix 8214326306 Mar 14  2021

as a few examples.

Yeah I run everything as htpc (except I don't know what systemd does, but it's either root or htpc), so doesn't seem like that should be a problem.

I noticed that as rclone is "activating", it's actually cycling through all the files repeatedly, going over the same dirs and files over and over. It's not taking hours to start, it just tries to keep reloading each file in a loop. I see the same files flashing by in the log, with accompanying errors.

I just stopped the mount, and started it again, this time with DEBUG log level. For every error it shows, here's the 5 messages around it:

Dec 05 13:01:59 htpc rclone[26558]: DEBUG : file.jpg: vfs cache: truncate to size=31799
Dec 05 13:01:59 htpc rclone[26558]: DEBUG : file.jpg: vfs cache: setting modification time to 2022-01-14 10:03:19.879732461 -0500 EST
Dec 05 13:01:59 htpc rclone[26558]: INFO  : file.jpg: vfs cache: queuing for upload in 24h0m0s
Dec 05 13:01:59 htpc rclone[26558]: ERROR : file.jpg: vfs cache: failed to reload item: reload: failed to add virtual dir entry: file does not exist
Dec 05 13:01:59 htpc rclone[26558]: DEBUG : Couldn't decode error response: EOF

If I go to my cache dir and check, the "file.jpg" is there.

The cache dir has the following file tree:

~/RCloneCache/vfs/B2{[somedigest]}/[bucket]/[path-inside-bucket]/…files…
~/RCloneCache/vfsMeta/B2{[somedigest]}/[bucket]/[path-inside-bucket]/…files…

Both dirs (vfs and vfsMeta) have files that show errors in logs.

Is B2 a polling remote? Did something on the other side go away?

@Animosity022 nothing changed in the remote. I checked b2 source, and I don't think it has any polling features, so I disabled polling in my config (see OP).


Random crazy idea. So I'm looking at the code, and the items in cache are found via c.get(). This function has a comment above it:

// name should be a remote path not an osPath
func (c *Cache) get(name string) (item *Item, found bool) {
[…]

The B2 backend, has an encoding parameter with the following defaults:

Default: Slash,BackSlash,Del,Ctl,InvalidUtf8,Dot

Could it be that the file path it's trying to find is being passed in a remote (encoded) form, with some characters altered, even though it's looking on the local filesystem?

Hmm, that does seem like it makes sense to me.

I think maybe @asdffdsa could give a test and see as well? I don't use any S3 stuff personally.

What this error message means is that rclone tried to add a virtual directory entry for the file it is about to upload so the rest of the system beleives it is really there.

However it gave an error because the directory that the file lives in isn't in the directory cache. Normally it would be in this reload scenario it isn't.

I think what is happening is that the directories that the file is uploading into doesn't exist. This is probably because b2 doesn't actually have directories and rclone does a bit of handwaving to make sure that they are there.

If you were to mkdir the parent directory of the file then it would probably recover OK.

Not sure what to do about this - rclone shouldn't go creating directories but in this case it probably needs to.

@ncw Ah it's almost the same issue as the old cache implementation had. You made this discovery in 2019 and there was even a GitHub issue that was closed upon transitioning to VFS. And this comment (second part). I guess VFS and B2 are just not friends at this time. :grimacing: Will try to create some dirs or check into other options.

For now the first thing I did was backup the entire cache dir (1.8TB of stuff I currently don't have elsewhere) into a second ssd. Glad I just installed this new ssd.

@ncw Just so I'm clear, you're suggesting to

  1. Start the VFS mount again
  2. In this mount, mkdir the parent directory of one of the erroring files that are in the cache
  3. See if this file no longer shows up among logged errors as it cycles through them

Should be doable, will try.

Yes that's correct. That will at least prove what's going on.

Mounted.
Quickly created Merged/Files/Directory (before rclone got there).

Watching logs, still this happened:

Dec 05 17:47:35 htpc rclone[29307]: DEBUG : Files/Directory/File (description).jpg: vfs cache: truncate to size=8290
Dec 05 17:47:35 htpc rclone[29307]: DEBUG : Files/Directory/File (description).jpg: vfs cache: setting modification time to 2022-11-29 06:13:55.679387594 -0500 EST
Dec 05 17:47:35 htpc rclone[29307]: INFO  : Files/Directory/File (description).jpg: vfs cache: queuing for upload in 24h0m0s
Dec 05 17:47:35 htpc rclone[29307]: ERROR : Files/Directory/File (description).jpg: vfs cache: failed to reload item: reload: failed to add virtual dir entry: file does not exist
Dec 05 17:47:35 htpc rclone[29307]: DEBUG : Couldn't decode error response: EOF

Something new I didn't notice before. According to logs, some files are succeeding to get added (it's only a few, apparently some got uploaded the first ever time I mounted this VFS and copied files into it). Here's what it looks like.

To be clear, this file is present in both cache, and B2. The files that fail are only in cache.

Dec 05 17:57:18 htpc rclone[29668]: DEBUG : OtherFiles/Dir/Subdir/File.jpg: vfs cache: checking remote fingerprint "1150,2022-05-13 04:55:02.897 +0000 UTC,bc0d970545fa59f9057998d1be5bcd8b33dc9698" against cached fingerprint "1150,2022-05-13 04:55:02.897 +0000 UTC,bc0d970545fa59f9057998d1be5bcd8b33dc9698"
Dec 05 17:57:18 htpc rclone[29668]: DEBUG : OtherFiles/Dir/Subdir/File.jpg: vfs cache: truncate to size=1150
Dec 05 17:57:18 htpc rclone[29668]: DEBUG : vfs cache: looking for range={Pos:0 Size:1150} in [{Pos:0 Size:1150}] - present true
Dec 05 17:57:18 htpc rclone[29668]: DEBUG : OtherFiles/Dir/Subdir/File.jpg: vfs cache: setting modification time to 2022-11-19 03:47:53.211286808 -0500 EST
Dec 05 17:57:18 htpc rclone[29668]: INFO  : OtherFiles/Dir/Subdir/File.jpg: vfs cache: queuing for upload in 24h0m0s
Dec 05 17:57:18 htpc rclone[29668]: DEBUG : OtherFiles/Dir/Subdir: Added virtual directory entry vAddFile: "File.jpg"

Weird that they're queued for upload even though they are in B2. :thinking: It might be because my ExecStartPost is calling rc vfs/refresh recursive=true which maybe updates modification time? Not sure if that's true.

P.S. Is this entire log happening because of rc vfs/refresh call on ExecStartPost?
Edit: no, I removed it and it behaves exactly the same.

Ok, I'm going hardcore. Recompiling the source with my own debug statements.

Added this line:

func (c *Cache) reload(ctx context.Context) error {
  fs.Debugf(nil, "RELOAD DEBUG: len(c.item): %v", len(c.item)) // <-- this line
  …

Ran it on my box, and found that the c.item is empty. Trying more things.

2022/12/05 19:29:32 DEBUG : RELOAD DEBUG: len(c.item): 0

If you have suggestions what I should try, let me know.

So I dug super deep, and discovered the following:

When I mount this vfs mount, here's what happens.

  1. cache.reload is called and starts walking the cache directory
  2. A file is found in the cache dir, and since we haven't populated cache.items yet, item.reload is called
  3. This calls vfs.AddVirtual with the remote path for this item
  4. This calls vfs.StatParent on the remote path
  5. This calls dir.Stat on the remote directory path, and dir.stat on the leaf
  6. dir.stat calls dir._readDir to populate the directory with items
  7. _readDir reads items directly from B2 (not from local cache) :question::question::question:
  8. B2 doesn't have these items because rclone never uploaded them from cache
  9. dir.stat returns ENOENT which propagates all the way up to original cache.reload
  10. Error is printed

And then I realized. The error is printed, but that doesn't mean things broke. Maybe the error just means that this file that was found in cache doesn't exist in remote location. My hopes are up at this point.

So I decided to ignore the errors and double check what exactly happened to my systemd startup.

I ran systemd again, and this time I realize that it was timing out, killing rclone, and restarting it repeatedly, never letting it fully start due to timeout. So the logs appear to show rclone walking the cache endlessly, because systemd hasn't let it start even once.

So I increased systemd timeout to 15 mins, and after all the errors flew by, it finally mounted. I had high hopes now. But they were crushed.

The mounted directory only has files/folders that are present on B2, and none of the cached ones. :frowning_face:

So there's still a bug. And it might be related to that behavior I described above. Is that intended behavior? Or maybe it's still the B2 directories issue. :man_shrugging: e