Testing for new vfs cache mode features

I don't think there is much magic other than running a lower dir cache time and let it expire. You could script rc command but that feels like overkill.

Good news! I'll merge the current branch into master tomorrow :slight_smile:

4 Likes

I finally want to test this but I'm a bit lost. What is the link to the real latest beta in this branch?

It might be a good idea to keep this link updated in the first post.

  --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' \
  --config=/home/user/.config/rclone/rclone.conf \
  --allow-other \
  --fast-list \
  --vfs-read-chunk-size=10M \
  --vfs-read-chunk-size-limit=0 \
  --buffer-size=0 \
  --poll-interval=1m \
  --no-modtime \
  --drive-pacer-min-sleep=10ms \
  --dir-cache-time=24h \
  --timeout=10m \
  --umask=002 \
  --union-search-policy=eprand \
  --log-level INFO \
  --log-file /opt/rclone.log \
  --async-read=false \

This is my current settings. I have read that it's not possible to use buffer-size=0 is this still true?

I want the settings that gives me the minimum bandwidth per open file as possible. So I can open as many files as possible

I've merged the new VFS to master now which means it will be in the latest beta in 15-30 mins and released in v1.53 :smile: - specifically from https://beta.rclone.org/v1.52.2-186-g96c2fdb4-beta/ onwards.

It has been quite a journey - the first commit was made on 12th Feb 2020 and I merged all 40 commits on 30th June.

 49 files changed, 7552 insertions(+), 2287 deletions(-)

I've worked very hard to make it super reliable. There will undoubtedly be more features to come to address the issues that have come up!

Thank you to you all for testing and feedback!

-- Nick Craig-Wood - rclone maintainer - please consider supporting the project - thanks!

12 Likes

I tried the very early version and found that files wouldn't upload in Windows, folders would create but nothing would ever go up. I gave the very latest version (before the master beta) on Windows a go and I couldn't get it to break, worked flawlessly.

Rclone is an excellent tool and this makes it even better which I didn't think possible.Thank you so much for your work and this new feature.

1 Like

Is there any way to handle renames of files while they are being uploaded? It seems to be renamed fine in cache but the upload to the remote fails with these errors:

2020/06/30 18:07:47 ERROR : test.bin.partial~: Failed to calculate src hash: hash: failed to open: open /home/darthshadow/rclone/vfs/TD/vfs/PersonalTD/Media/test.bin.partial~:
no such file or directory
2020/06/30 18:07:47 ERROR : test.bin.partial~: corrupted on transfer: MD5 hash differ "" vs "2dd26c4d4799ebd29fa31e48d49e8e53"
2020/06/30 18:07:47 INFO  : test.bin.partial~: Removing failed copy
2020/06/30 18:07:48 ERROR : test.bin.partial~: vfs cache: failed to upload try #1, will retry in 1m0s: vfs cache: failed to transfer file from cache to remote: corrupted on transfer: MD5 hash differ "" vs "2dd26c4d4799ebd29fa31e48d49e8e53"

In the above scenario, these were the steps performed:

  • Create a test.bin.partial~
  • Observe that it starts getting uploaded.
  • Rename to test.bin

It does succeed on the second try, but that results in it being uploaded twice. Any chance the rename could be applied post-upload and handled for the post-upload checksum too?


Also, would it be possible to convert the INFO logs for cache expiry of the metadata file to DEBUG instead? I think a single log for the cache expiry of the actual file should probably be sufficient for most users.

Thanks Nick for your work on this! It's a huge improvement over the old cache backend in every possible way!

Just to throw a few feature requests in here before this thread is wrapped up: I'd love to see RC endpoints for the VFS cache. Specifically, one that lists the contents of the cache (along with their expiry time) along with the total size and configured max size and expiry time. The other one would be an endpoint to manually remove items from the cache.

Thank you :smiley:

Both great ideas - I'll add them to the list when I do the vfs cache API which should be soon!

I managed to replicate that no trouble.

Probably my preferred approach to this would be to cancel the upload and restart it under the new name.

What do you think of that approach? This is likely to happen at the start of an upload I think.

The other approach requires pending renames which rclone does do for non vfs cached files. I took it out for the vfs full/writes cache as it is has rather a lot of edge cases!

The idea of the --vfs-write-back delay (default 5s) is that it gives apps time to close the file then rename it before rclone starts uploading it. You could also try increasing that time. Which app are you having the problem with out of interest?

I think you mean these ones

vfscache/item.go:               fs.Infof(item.name, "vfs cache: removed cache file as %s", reason)
vfscache/item.go:               fs.Infof(item.name, "vfs cache: removed metadata from cache as %s", reason)

I think they could both be debugs probably - what do you think?

It is now in the latest beta :slight_smile:

That will work fine now.

These don't affect memory use at all - they are the chunk sizes rclone requests from the upstream so you can make them bigger or use the defaults.

You could add --use-mmap which makes rclone much better at returning buffers to the OS, though with --buffer-size 0 it won't have any effect...

1 Like

I would still prefer the post-upload rename if you can make it work because the rename sometimes happens a couple of minutes into the upload which is wasted bandwidth. However, if there are edge cases with that which are difficult to resolve, then this approach is fine with me too, as long as it's not uploaded completely twice.

I would have thought so and did increase it to 30s but that didn't solve it. It did work with 2m eventually and even that seems random in the amount of time after the write that the rename occurs.

This is with Radarr which writes partial files and then renames it to the non-partial variant after the write is done.

The cache expiry log was info level earlier too, so I am not sure whether you want to make it debug now. It may be useful for someone. My only interest is to reduce the number of logs unless there is a case where the actual file and its metadata can expire at different times.


Another issue on another system that I have observed is when the modification time of the file is updated after a rename which results in the following errors:

2020/07/01 19:32:05 INFO  : storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/hk/n5lbitsg2jt3egg3m2otcmt2qrmugaoxcodn7l3l3tordogpmq.sj1: Renamed in cache to "storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/hk/n5lbitsg2jt3egg3m2otcmt2qrmugaoxcodn7l3l3tordogpmq.sj1"
2020/07/01 19:32:06 ERROR : storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/hk/qich57lbblrcv75tligqhvnwrhkdvxdvb4q35uzrbmppx5n47q.sj1: Failed to set modification time of cached file: chtimes /home/darthshadow/rclone/vfs/StorjDrive2/vfs/StorjZen-2/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeh
a4yx74n5aaaaaaaa/hk/qich57lbblrcv75tligqhvnwrhkdvxdvb4q35uzrbmppx5n47q.sj1: no such file or directory
2020/07/01 19:32:06 ERROR : storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/hk/qich57lbblrcv75tligqhvnwrhkdvxdvb4q35uzrbmppx5n47q.sj1: vfs
cache: setModTime: failed to save item info: vfs cache item: failed to write metadata: open /home/darthshadow/rclone/vfs/StorjDrive2/vfsMeta/StorjZen-2/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/hk/qich57lbblrcv75tligqhvnwrhkdvxdvb4q35uzrbmppx5n47q.sj1: no such file or directory

Let me know if you need the reproduction steps so I can enable debug logging for getting those.

OK I'll have a think

Can you work out the exact sequence of events here?

Is it this

  • Radarr opens partial
  • Radarr writes to file
  • Radarr closes file
  • 5 seconds pass
  • Rclone starts upload
  • 2 minutes pass
  • Radarr renames partial file

Or this - names radarr closes the file just before the rename. That is what I'd expect it to do.

If rclone is uploading the file when it is still open then that is a bug unless the file hasn't been modified.

  • Radarr opens partial
  • Radarr writes to file
  • 5 seconds pass
  • Rclone starts upload
  • 2 minutes pass
  • Radarr closes file
  • Radarr renames partial file

I guess this is a plausible sequence with radarr re-opening the file but not writing to it. You should be able to tell what is happening from the DEBUG logs.

  • Radarr opens partial
  • Radarr writes to file
  • Radarr closes file
  • 5 seconds pass
  • Rclone starts upload
  • Radarr opens partial
  • 2 minutes pass without radarr writing data
  • Radarr closes file
  • Radarr renames partial file

OK I made the metadata expiry a debug. It is unlikely you'll see anything but them both being removed in normal operation so I think debugging the metadata removal is a good idea.

If you can work out a sequence to repro that, that would be super helpful. I tried some obvious things but I couldn't reproduce the error.

I was speaking about network bandwidth, I don't really care about memory usage.

So I finally updated my mounts!! With this settings:

--user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'
--config=/home/user/.config/rclone/rclone.conf
--allow-other
--fast-list
--vfs-read-chunk-size=10M
--vfs-read-chunk-size-limit=0
--buffer-size=0
--poll-interval=1m
--no-modtime
--drive-pacer-min-sleep=10ms
--dir-cache-time=24h
--timeout=10m
--umask=002
--union-search-policy=eprand
--log-level DEBUG
--log-file /opt/rclone.log
--async-read=false
--vfs-cache-mode full
--vfs-cache-max-size 800G
--vfs-cache-max-age 72h
--cache-dir /mnt/cache
--rc
--rc-addr=localhost:5572
union: /mnt/remote

I think it's using much more bandwidth than before. Is it doing some kind of read ahead/async reading regardless of my settings?

I plan to share the impact of my API usage as soon I have enough data

Also I'm using v1.52.2-192.g8d5bc7f2-beta hopefully it's the right one!

UPDATE: Within less than 10 minutes I had to revert to old settings.

Files weren't opening at all, lots of buffering, I started seeing 403 errors in the log about being rate limited.

I don't know what I did wrong, or if I was using the wrong rclone version...but for me this cache mode didn't work :frowning:

With --buffer-size 0 it shouldn't be doing any more readahead.

That should be a good version.

I suspect the cache mode full could use more bandwidth but it shouldn't use that much more bandwidth. It is more CPU / disk IO intensive though and you told me that your servers are very busy so I suspect it might be more that that is the problem.

We'll get there in the end I'm sure!

Does this mean that even if say you have a buffer-size set to 64M, if a program opens a file and reads something from say the first 5MB of it and then closes the file, only 5M or so will be read and then committed to cache and the rest of the buffer will not be filled unnecessarily (and also not committed to cache)?

EDIT: looking at the sizes in the cache it looks like this is indeed what happens.

It depends on whether the program is reading faster than the network. If it is then it will keep up and the buffer will never have any excess in it. If not (let's say a video playing) then the excess network speed will fill up the buffer. This will be written to disk even if the program doesn't read it, so the transfer isn't wasted.

1 Like

Sent the logs via PM. From what I can understand, it seems to be the 3rd possibility, i.e. opening partial without writing and then renaming it.

Great. Another enhancement would be to add a log for when the file has been copied to the cache. Perhaps the Copied logline can be used then, and instead, log something like "Uploaded" when the copy to the remote is done?

Sent the logs via PM. Nothing obvious jumped out to me as the cause for it.

Yes it looks like it.

I've implemented a fix for this. I'm afraid it is a cancel and retry fix rather than delayed rename. I decided the latter was too complicated as we are now relying on the cache to be a true representation of what we want the remote to be, so we we delay the rename of the file it will have many knock-on consequences.

Here is the fix for that - let me know how it goes!

v1.52.2-195-gf8137215-vfs-beta on branch vfs (uploaded in 15-30 mins)

There isn't a moment when we say "the file is downloaded to the cache" at the moment, just streams of files which get opened etc. I could do a log the first time stuff got downloaded to the file maybe?

The "Copied" log lines come from operations.Copy which is the workhorse or rclone to copy two objects. If I changed logs I'd have to change them here which would affect all of rclone...

No me neither... Do you remove files from the cache externally? Or start a second rclone with the same VFS cache?

Working well, thanks. Minor nitpick:
There seem to be multiple error log entries for the context cancel, any chance they can be reduced? These 2 seem pretty similar too:

2020/07/04 12:13:25 INFO  : file.partial~: vfs cache: upload cancelled sucessfully
2020/07/04 12:13:25 INFO  : file.partial~: vfs cache: cancelled upload

That's a no-go then.

That also wouldn't be very helpful since I want to see when the full file has gotten copied to the cache.

Can something be logged at the part where you add the files to the queue to be uploaded? I think that would be a pretty good point to indicate that the file has gotten copied fully.


Neither of those. This happens on a regular mount.
Btw, this used to happen with the older vfs cache too so not something that was introduced as part of these changes. Perhaps I should open a new thread about this?

1 Like