Anybody using rclone Mount and rar2fs with Plex?

jcastill · January 7, 2020, 7:47pm

I have a lot of content that resides inside rar files, since before using rclone and moving everything to the cloud, I used local storage and rar2fs to serve the unrar files directly to Plex. This was a great setup as I didn't need to unrar the content and could seed the files directly without double storage.

Since then, I moved all my content to the cloud, and kept using rar2fs, yet as my collection keeps growing, I'm starting to feel the pain of this double layer. as each time I remount rar2fs (every restart) it needs to warm it's own cache which ends up taking almost close to two days. This is because rar2fs needs to fetch each rar file and read the inside contents to know what to show, while it doesn't need to download the whole file, it does mean it needs to grab each file at least one time which takes some time.

My question is if somebody has this setup running and has any optimization tips? I guess I will need to eventually unrar all content and reupload it to the cloud as a long term solution, but wonder if anybody has something similar setup and how it's working for you.

Thanks!

oodain9f · March 10, 2020, 5:45am

I too would like to gather information here about how to optimize uncompressed RAR playback using something like rar2fs. One option that I know will help for rar2fs is --seek-length=1.

jcastill · March 10, 2020, 6:00am

Thanks that's a great "must" flag when using it with rclone. As otherwise pretty much all your files need to be fetched which would take a long time.

I've made good progress in making rar2fs work with rclone. Currently I mount my rclone partition with rar2fs with the following flags: --seek-length=1.warmup

--seek-length=1 will just analyze the first rar file of the folder, this means way less polling as only part of one rar will be downloaded.
warmup will analyze all the rar files in the mount before you even try to list them. Even better it will use multiple threads (default 5) so it will be way faster.

Please note you need the last commit for this to work, as latest release (1.28 as of right now) doesn't include it yet.

Hope this helps you. So far in my library the speed increase in first Plex scan after remount is way faster, even the warmup speed vs not using the flag (and using ls -R) is like 50x faster.

thestigma · March 10, 2020, 4:22pm

I've looked at rar2fs before, and it is a very enticing idea.
I think for me the big killer is that it can't support writing at all (due to some sort of licensing issue that applies to making the archives and not just reading them). However you could always work around this I guess by having a different part of the system create/change the files since that is pretty easy to script...

Is anyone aware of a way to work around this? I don't really understand why I couldn't just lean on my normal Winrar installation to do this task rather than rar2fs needing to do it internally (and not being allowed to due to the licensing problem).

Anyway I think you have identified the main problem here - that so many files need to be accessed and read to get their content information. Most Cloud services that aren't premium pay-pr-action types tend to have limitations on the number of requested transfers pr second. For Gdrive for example this is about 2-3/sec. Obviously if you then have to read a handful of bytes from thousands of files (or more) then this is going to end up taking a lot of time regardless of your CPU or bandwidth...

I can see a few potential solutions...

The simplest possible thing would just be to make fewer/larger archives. The size of the file won't make it a lot slower to grab the metadata from, but the number of file-accesses will. Thus if it is practical to just bundle together more data into larger archives this would scale down the time spent pretty much linearly.

The ideal solution would be to have a local cache of the metadata. Since rclone will have the hash-values from each file when listed it would be simple to just compare them to the existing cache. Is the hash still the same? Then we know the data is the same as before and we don't have to access file at all. Only new or changed files would be have to be accessed - which would obviously speed up the process immensely, or even eliminate it entirely in a lot of cases. You'd still have to list everything at some point - but only once until the file actually changes...

However, this would probably require a remote-integration of rar2fs for rclone so these programs can coordinate (ie. rclone would probably need to run the metadata cache as rclone is the one that knows the hashes, but rar2fs would need to be aware of the cache and use it to make optimized choices). While this would be a near perfect solution I think - it would need some work to make happen.

One general optimization tip that should apply to any solution is to enable "quick-open information" (under Compression --> Options in Winrar). I recommend setting this to "enable for all files". This is not on by default. The only downside is a (quite trivial) size increase in the archive).
What does it do? Well it adds a duplicate of the file-listing in the archive with all the metadata right up front in the file. Normally this information is spread around in many different places in the file.
This usually doesn't matter so much when reading from a SSD or even HDD since they can random-access to the file very fast, but when reading an archive from a Cloud the benefit is immense. This is because rather than having to do dozens or hundreds of seeks (which are fast on rclone compared to opening new files, yet still hundreds of times slower than on a local drive) we can just read one block of data and get everything we need.

Just make a large and complex archive and compress it once with this off, and once with it on. Then try to open it via a mount. With the feature disabled you might have to wait several minutes to show the contents, but with it on it will be nearly instant. I use still for all rar files - especially all that might go to my cloud storage.

This doesn't fundamentally solve the problem that we still need to access each file once, but at least each file will be listed much faster - especially when they contain lots of files).
You will have to recompress the file to enable this option I think. In theory it might be able to just convert it and add in that to an existing archive, but I'm not sure if that's an option anywhere. In any case it does necessitate a reupload for a cloud since the file changes...

I've kind of been hoping we could get a rar2fs integration at some point, as this would be a fantastically efficient way to handle lots of small files (as a single file on the cloud-side) - increasing performance drastically but also heavily mitigating the maximum number of file limits that many Clouds have (like 400.000 for Gdrive).

@ncw An interesting topic for you to be aware of perhaps?

thestigma · March 10, 2020, 7:49pm

Just to quickly illustrate with a real example how important the quickopen feature is when working with rarfiles on a cloud-drive, I made some quick "torture-tests".

Test1:
1000 small textfiles in rar archive, quickopen disabled (default)
Time to open and list all files via Winrar over mount:
10min 15sec

Test2:
17500 small textfiles in rar achive, quickopen enabled
Time to open and list all files via Winrar over mount:
ca 15sec
(and I suspect most of this delay is not actually in the listing, but rather the Winrar Windows app in this situation, so in a more direct type of access this could probably be much much faster).

So we are talking at least 700x faster pr file, but probably closer to a few thousand times faster if it were more tightly integrated rather than using a Windows-native app over a mount.

So TLDR: Set this feature on as your default compression profile and thank me later

oodain9f · March 11, 2020, 7:07pm

The RAR files that I'm referring to is a standard for distributing video content and checksumming it. I'll never have a need to edit these archives. The RAR archive is uncompressed multi-part, so the only reading that is required is to access the header of the first file to get the file content.

rar2fs works great for this locally, but unfortunately when using rclone with an encrypted cache it can still be slow, averaging about 10 seconds per Movie. This isn't a problem if you're just trying to access a single folder, but with Jellyfin (haven't tested Plex) it can take a much longer time to scan a media library. I imagine if I were to save my Jellyfin cache locally, then it would be a one-time penalty.

I think this is probably better ways that media systems could handle RAR files automatically. They do play with mpv just fine, but the only way to do this currently is something like rar2fs to create a virtual mount. I also wonder if rar2fs supports, or maybe will support in the future the option to cache file metadata. That should increase the speed when scanning. I haven't tried the warmup option that @jcastill mentioned yet, but that may help significantly too.

I think keeping the releases in their original format adds a lot of value though and ensures that you have a high quality source. Outside of that though, I personally will never have a need to create these myself or edit them.

jcastill · March 12, 2020, 6:12pm

That's true, you can't add files to the rat files, yet there is (very basic) support to add files to the folder inside rar2fs. You could in theory even add rar files, list the folder and see the files inside the rar files straight away. Due to licencing, unrar is freeware while making rar files requires a licence.

This is solved using the --seek-length option as it will only check first rar file instead of all of the files.

This is already done by rar2fs. Cache lives as long as the volume is mounted. You can also manually flush the cache if needed. Rar file is only checked on first access, when cache becomes invalid (corrupted data), when not all rar files are processed and accounted by unrar library, when rar2fs notices the file changed.

That's great tip for people who make their own rar files! Unfortunately in my case is more of downloaded rar files I did not generate. Usually made in store mode. [quote="oodain9f, post:6, topic:13728"]
rar2fs works great for this locally, but unfortunately when using rclone with an encrypted cache it can still be slow, averaging about 10 seconds per Movie.
[/quote]

This is fixed with the warmup flag. The previous way to populate/warm the cache was basically wait for a rescan of Plex or do a ls - R on the mount. On cloud based storage this took forever as it was one file at the time with multiple delays and overhead in each step as well as slowing down all fuse requests on the rar2fs mount.

What the warmup flag makes is create background processes (default 5) that will crawl and process the rar files in each folder before you actually request them. This will avoid the FUSE mount to become slow. Mount is available right away and the process via fuse will still work (in case cache is not complete yet)

In my setup warmup went from 7 hours down to 5 minutes. It's an amazing speedup. Also 5 processes is the sweet spot to avoid api limits.

thestigma · March 12, 2020, 9:19pm

Good tips!
I presume the local cache of the metadata in rar2fs is not persistent and must refresh each time though...

In theory this should not be needed. rclone knows the hashes of files, so it could just store the metadata forever until the file was changed (which makes the hash change). The problem with rar2fs doing this itself is that it doesn't know and can not see the file hashes (not unless there was communication between rclone and rar2fs added)

jcastill · March 12, 2020, 10:03pm

The metadata is persistent during the mount. If you remount or flush the cache it needs to be repopulated.

My current setup is mounting an encripted gdrive with rclone based off animosity 22 setup and from there mount the rar2fs partition where I point plex to. Since using the rc refresh command and warmup leaves me with a fully functional mount in 5 minutes, which is not bad

thestigma · March 13, 2020, 12:35am

Yes 5 minutes is tolerable (but it could still be optimized a in theory is what I'm saying)

And also --seek-length=1 to make it faster is kind of "cheating" and you are not getting that performance for free. All that will accomplish is to make the initial warmup faster so you will need fetch the metadata for the rars on-demand instead of up-front. You will pay the price of opening the uncached rar files much more slowly when you need them (which may or may not be a big problem depending on your particular use-case). Very large (many files) archives with no quickopen data could be very slow and unresponsive to access and probably cause applications to appear to freeze until you can actually read the contents.

Note that I am making some assumptions here about how --seek-length works in rar2fs because I am not an expert on this system, but the parameter seems relatively self-explanatory, and as you describe it.

jcastill · March 18, 2020, 11:09pm

Basically what --seek-length=1 accomplishes is only checking *.rar files for content inside rars. If you use multi file this will speed a lot (specially using rclone) as it will only scan one file instead of all of the files. The main problem is in some cases, winrar won't store the complete list of files in the first rar, so you won't be able to access all of them.

Also, rar2fs 1.29.0 is now released, so the warmup feature is now available without the need of cloning github. It also has a lot of fixes for multiple rar files etc.

thestigma · March 19, 2020, 12:29am

Thanks for the tip! I will need to check this out again and get some more experience with it - because I really like the general idea and think it has great potential.

system · June 17, 2020, 12:42am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.