New feature: CACHE

seuffert · December 14, 2017, 1:26pm

it seems that remusb has found the culprit and is working on a fix. please see https://github.com/ncw/rclone/issues/1904 for further updates on this issue.

Dimtar · December 15, 2017, 4:00am

To confirm this cache feature is mostly to improve things that access information alot, such as Plex to reduce API hits? Its not going to do anything for writes?

amaklp · December 15, 2017, 1:22pm

For writes I think the mount command with the new vfs-cache modes will improve much the writes.

Here’s a useful comment from ncw

remus.bunduc · December 18, 2017, 3:52pm

Here is a new beta that should have fixed the issue with disappearing files. I had to refactor how file info is cached though, there wasn’t a fix in the old model. I can’t say I’m still 100% happy of the caching done now either but as a near term solution it should work out fine.

Some key points:

cached file info will not be deleted anymore, they simply expire a timestamp associated with them and the next listing will see that and ignore if it’s expired
when a file/folder expires all its parents up to the root will expire too. This is less than ideal but like I said, as a compromise, it works. My next milestone is to actually integrate with notifications from the cloud provider if they exist (gdrive has them)

I would strongly recommend to clear the cache before running the next version.
https://beta.rclone.org/v1.38-238-g6f1ae00c/

kelinger · December 18, 2017, 5:47pm

Thanks, @remus.bunduc. Testing it now!

kelinger · December 18, 2017, 6:59pm

Before I draw any conclusions, I want to make sure I’m clearing the cache correctly. Basically, I’m doing my mount with the --cache-db-purge option. Should I also be clearing any files or running additional options? The reason I ask is that, if you look at the stats below, the directory listing is taking almost 15 minutes the first time (in earlier betas it was around 1.5 - 2 minutes). Second, the subsequent directory listing takes even longer. In fact, each time I run it (even if I reboot in between) another couple minutes gets added and this is with clearing the db-cache on each mount.

However, I can say that it looks like files are not being lost (which is preferable to having a fast process that loses files)!

Precache directory structure:
time ls -R rclone/ > /dev/null
13 minutes, 47 seconds

Verify directory structure (should run off cache):
time ls -R rclone/ > /dev/null
15 minutes, 17 seconds

Copy a large file to temp
time cp -R rclone/Plex/Movies/10\ Cloverfield\ Lane\ (2016)/10\ Cloverfield\ Lane\ (2016).mkv /tmp
39 seconds

Re-copy a large file to temp (should be cached):
time cp -R rclone/Plex/Movies/10\ Cloverfield\ Lane\ (2016)/10\ Cloverfield\ Lane\ (2016).mkv /tmp
37 seconds

remus.bunduc · December 18, 2017, 10:08pm

@kelinger are you doing any writes to the mount during the listing or between the 2 listings? Second, what’s you file age set at?

As for copying, that really should be no different from past releases. This beta doesn’t touch anything for reading. Naturally, we can’t 100% rule out an issue, even with the tests performed before.
Silly question but are you sure you’re wrapping a cloud remote or that you’re copying from the its mount point?

Here’s a similar test on a rclone running on a mac (my actual library box):

mediabox:cache-backend remus$ time cp /mnt/gdrive/test/jellyfish-110-mbps-hd-h264.mkv ~/Downloads/

real	1m29.061s
user	0m0.003s
sys	0m0.448s
mediabox:cache-backend remus$ time cp /mnt/gdrive/test/jellyfish-110-mbps-hd-h264.mkv ~/Downloads/

real	0m1.103s
user	0m0.003s
sys	0m0.333s

Oh and to answer your question: --cache-db-purge will delete everything stored by rclone before, there’s no need to delete anything manually (both chunks and file info)

kelinger · December 18, 2017, 10:16pm

No writes to the mount. This is actually a fresh Linux (Ubuntu 16.04) install with not much beyond all updates for Ubuntu and rclone being installed at this point. I’m not even running the copy in the background (ie, not using multiple terminals or tmux, etc.).

These are the only parameters I’m setting:
[GoogleCache]
type = cache
remote = Google:
chunk_size = 5M
info_age = 15m
chunk_age = 3h
warmup_age = 24h

Basically rclone cache is mounting from the Google rclone config. No other wrapping involved. The mount point is ~/rclone.

If my results still seem off (that is, there isn’t another option I should be using/trying or I’m not doing anything unusual) then doing a reformat and fresh Linux install is not out of the question–especially if it helps! I won’t do that until you confirm there’s nothing else you want me to try first.

remus.bunduc · December 18, 2017, 10:25pm

OS reinstall shouldn’t change anything, especially on a fresh install with stable stuff on it.

Your config seems a bit outdated:

[cache]
type = cache
remote = google:
chunk_size = 5M
info_age = 24h
chunk_total_size = 20G

Here is what I have without the Plex info. Try to add a chunk_total_size value there. It’s quite possible nothing is allowed to stay in the cache without a value there at all. The default is 20G but I never understood if rclone plays well without specifying values at all. A -v output should clarify what you’re using (here’s mine):

mediabox:~ remus$ rclone mount -v --allow-other crypt: /mnt/gdrive --cache-writes --cache-db-purge
2017/12/18 23:47:50 INFO  : cache: Storage DB path: /Users/remus/Library/Caches/rclone/cache-backend/cache.db
2017/12/18 23:47:51 INFO  : cache: Chunk Memory: true
2017/12/18 23:47:51 INFO  : cache: Chunk Size: 5M
2017/12/18 23:47:51 INFO  : cache: Chunk Total Size: 20G
2017/12/18 23:47:51 INFO  : cache: Chunk Clean Interval: 1m0s
2017/12/18 23:47:51 INFO  : cache: Workers: 4
2017/12/18 23:47:51 INFO  : cache: File Age: 24h0m0s
2017/12/18 23:47:51 INFO  : cache: Cache Writes: true
2017/12/18 23:47:51 INFO  : cache: deleted (0) chunks

kelinger · December 18, 2017, 11:12pm

Well, that’s something different!

With these changes, the initial file cache took 5 minutes, the first file copy 2m 52s, and the second file copy 21s.

So my problem (besides the disappearing files which now seem to be fixed) was using outdated parameters. Adjusting to match yours, things are now looking quite good.

Thank you for pointing me in the right direction and great job on the ghost files!

djsecrist · December 19, 2017, 2:18am

I think the burning questions are:

How big are your movie libraries?
How does the Plex scan feature perform?
How does the Plex integration work? What should the Plex settings be? How quickly does it identify and add new movies when placed on the file system? What about if placed from outside the cache?
Has the Google ban risk been completely mitigated?
How does the movie starting, seeking, and watching compare to PlexDrive?
Have you tested it with Radarr and Sonarr?

Animosity022 · December 19, 2017, 4:36pm

I was giving it a whirl with a pretty basic config of:

[gmedia]
type = crypt
remote = GD:media
filename_encryption = standard
password = something
password2 = something

[cache]
type = cache
remote = gmedia:
chunk_size = 5M
info_age = 24h
chunk_total_size = 20G

I did a mount via:

./rclone mount -v --allow-other gmedia: /GD2 --cache-writes --cache-db-purge

and just did a ls -alR | wc -l to see what happened. I have ~30TB on my GD and noticed my API hits start to climb to 4-5 per second so I stopped the process. Is there more config as it doesn’t seem like this would quite work yet in terms of generating too many API hits causing a ban so I stopped it.

I’ve been running plexdrive for months without a ban or actually any problems so I want to test, but I’m a little hesitant.

 ./rclone  -V
rclone v1.38-238-g6f1ae00cβ
- os/arch: linux/amd64
- go version: go1.9.2

remus.bunduc · December 19, 2017, 8:32pm

So yeah, it’s a tricky thing. There’s no way to be sure about getting a ban or not. Most of the bans I got weren’t even documented by google so trying to anticipate that is impossible.

To your case: do you actually need to list everything recursively? That is a greedy and very fast approach to warm up the cache. I prefer to do it more lazily and warm it up only when and if needed. This applies especially to listings, Plex or any other reader will wait for the listing to complete and it usually happens in the background too so there’s no bad experience to the user.

Try it this way: start the mount, start Plex and leave it on its own. If you have the library loaded, even better. If you don’t have it loaded then Plex will need to scan it but it will do it at a slower pace than a ls -R. Check the request count during this and please tell me where you see the request count. I found the Dev Console on GDrive to be too simple and lacking in info, especially real time one.

Animosity022 · December 19, 2017, 8:42pm

Sadly, I was just using the API console to validate the hits:

https://imgur.com/TrnmnLz

I only did the recursive as I know plex scans down the directory.

I also do that to validate I have a good chunk of files in my mounts before starting up plex so it doesn’t blow up and/or remove files (I have library delete off but I’m cautious).

felix@plex:/TV$ time ls -alR | wc -l
18661

real	0m6.636s
user	0m0.088s
sys	0m0.092s
felix@plex:/TV$ time ls -alR /Movies/ | wc -l
8186

real	0m1.856s
user	0m0.020s
sys	0m0.060s
felix@plex:/TV$

On plexdrivre, it normally builds up the cache on the first startup and that takes maybe ~15-20 minutes and I rarely see any API hits anymore on full scans or anything since it’s all in the cache. I was more concerned if I should be throttling the mount via the tpslimit or something to ensure I don’t get a ban as I’ve never got one and didn’t want to start a trend

remus.bunduc · December 19, 2017, 8:53pm

If you don’t want to break that trend I would wait a bit more. It’s still marked as experimental even though I don’t think there will be that many updates until it gets in 1.39.

One ban isn’t permanent though and if you want to try it out, your results can be shared here as a feedback to the others.

kelinger · December 19, 2017, 10:02pm

The reality is that when 1.39 comes out, the comparisons to PlexDrive are going to be posted minutes later. PlexDrive is a one-trick pony but its trick is going to be hard to beat. That is, it does one thing and it does it really, really well. If you’re mounting an unencrypted volume from Google for use by Plex, PlexDrive has been the go-to for many people these past 5 months or so.

While I, too, have been using PlexDrive since the pre-1.0 days, I am looking forward to the simplification of not having to mount then re-mount in order to get access to encrypted data or having yet a third mount to handle read-write with something like UnionFS and a bunch of scripts. Likewise, there are users of other cloud services that couldn’t reap the benefits of PlexDrive.

PlexDrive does have a different way of caching filenames and data, though, so it will be important that any true comparisons go beyond that of just “API hits” and retrieval speed. In order to make it more “apples to apples” you’ll want to adjust some of the defaults on either side to match as closely as possible. That may include pre-caching the files with the ls -R on Rclone’s side.

Stokkes · December 20, 2017, 2:29am

@remus.bunduc

Didn’t want to post an issue on GitHub, but I’m having a dilemma with the new cache that I feel many will have once it’s released.

Currently, if you have a cache remote mounted (let’s say for Plex) and you are actively getting new media (via whatever means that is), you can’t execute an rclone copy or rclone move to copy/move those files to the cache remote (you’ll get an error that the DB is in use).

The dilemma comes from the info_age duration and handling writes. Sure you could attempt to move the files directly to the mount, but if there are no retries, this could fail and you’d never know it.

If you move the files to another remote that uses the same cloud provider, the cache won’t pick up those files (since they’re modified outside of the cache).

I’ve noticed that various applications that use the cache mount (Plex, Sonarr, Radarr) that do a lot of file operations will slow down to a crawl (and cause the whole system to really slow down).

So the dilemma is… How do we get our new files added to a cloud remote in a robust way and have these appear in the cache?

Set info_age too low (5m) and you’re rebuilding your cache all the time, which will probably result in bans and/or slow your system down pretty much all the time
Set info_age too high, and any move/copy operations you do on a different remote but the same cloud provider and you’ll never see these files until the cache expires for those particular directories that were modified / had files uploaded to them.
Add files directly to the mount and you risk something happening and the file not retrying and not being uploaded, leaving you in a state of limbo.

Could --vfs-cache-mode writes help in this regard? I’m not sure.

Looking for ideas to get new media into my cloud provider and having them appear in the mounted cache remote as soon as they are uploaded.

Dimtar · December 20, 2017, 9:02am

Is there a way to confirm that rclone can connect to the Plex instance?

FoGBaV · December 20, 2017, 9:49pm

Has anyone tried the last Beta on Windows ?

I am only getting an empty drive …

kelinger · December 21, 2017, 9:37pm

Quick question about Plex integration:

Since I’m going to mount the drive as a service/daemon BEFORE I start the Plex server, will it present a problem that rclone cannot immediately connect to Plex when the mount is created? Or, once Plex itself is started, will rclone (eventually) notice?