Potential Plex-integration enhancement or bad idea?

I was running some tests for non-rclone purposes and building and tearing down a lot of Plex servers. One of the most frustrating parts for a new Plex setup is that initial scan (yes, I know there are ways to copy this data from another, fully-populated server). Basically, Plex has to look into each file to find out more than just the filename so that it can identify other attributes and queue up downloading of artwork and so on.

So I started thinking about it and realized that, with a chunk-size of 10M, for example, an rclone mount will have to grab the first 10M of each file as it is scanned (unless I’m mistaken). So, I dropped the chunk-size down to 1M and created a brand new server. That initial scan dropped from 1 hour to 45 minutes (25% speed increase) of my movies. [ Test with 500 movies on Google with Rclone Cache mount; time is measured from start of scan to the point where Plex shows a 500 movie count ]

Now, Plex still has to download the artwork and other elements but that’s outside of Rclone.

So, the question is: is it possible for rclone to “know” through Plex integration that Plex is scanning and, only then, bypass the set chunk size and instead grab a much smaller chunk since, theoretically, Plex is not going to look past that first chunk?

Alternatively, if it is impossible for rclone to know when Plex is scanning versus streaming, there could be an option so that the first chunk grabbed for any new retrieval would be smaller? For example, on a normal “grab” of a file (for copying, streaming, etc.), it might grab 10M + 10M + 10M + … EOF. Instead, though, it would grab 1M + 10M + 10M + … EOF.

While the difference between grabbing a 10M versus a 1M chunk is small, it adds up with things like scans of movies (or especially TV show episodes) so that small change over the course of 10,000+ videos adds up to noticeable speed improvement.

What did you notice in terms of bandwidth when it downloaded? I usually notice if I grab 10M, it takes usually less than a second. I’m running gig FIOS.

500 movies in that time frame seems pretty damn good though as if you kick off a metarefresh for ~1700 movies, I know mine takes probably more than 6 hours.

Well, to be clear, 1 hour (or 45 minutes) was the time it took for Plex to show that I had an inventory of 500 movies. They still didn’t all have artwork so the full meta-refresh was still running (but, I believe the rest of the data was Plex pulling from online databases, not through Rclone to Google).

With chunk size of 10M, I would see a graph of spikes–looks like a comb–of 10Mbps, idle, idle, 10 Mbps, idle, idle, etc. With 1M, it was constantly between 2-3Mbps throughout the space.

There’s obviously overhead in Plex that goes beyond Rclone in that it has to process the file once it opens it up, pull the metadata from the file and then add it to the queue to pull down the related extended attributes from TVDB, etc. and the smaller chunk size just reduces that initial request for the header of each file.

I’m not sure how big (or max size) the video metadata could be. That is, if its only the first 256K, for example, then grabbing just 256K for the first chunk would seem to really speed things up without increasing API calls (as long as you’re not using 256K for the streams or copies).

I can setup a new test server and get more specific data on bandwidth usage.

I just ran a more scientific test using an Ubuntu 17.10 KVM with the latest rclone/cache/mount, a Google-drive based movie library of 626 movies, and a fresh Plex install. Prior to starting the test, I cleared the Rclone cache and took a snapshot of the KVM.

First test: Rclone encrypted cache/mount with --cache-chunk-size=10M

Typical bandwidth during the test:

Time to 50 movies: 0:04:57
Time to 100 movies: 0:09:14
Time to 200 movies: 0:18:01
Time to 300 movies: 0:28:32
Time to 500 movies: 0:51:58
Time to 626 movies: 1:08:50

When 626 movies were detected, artwork had been added up to the letter H (since there are holes for movies beginning with “the” and other alphabetically ignored words, this isn’t a precise measurement but stay tuned).

Test 2

Rolled back the snapshot and re-mounted Rclone encrypted cache/mount with --cache-chunk-size=1M

Typical bandwidth usage during the scan:

Time to 50 movies: 0:03:49
Time to 100 movies: 0:07:48
Time to 200 movies: 0:14:40
Time to 300 movies: 0:22:01
Time to 500 movies: 0:36:48
Time to 626 movies: 0:48:01

At the end, artwork up to the letter M had been rendered in Plex

So, the scanning of the files on Google for Plex to get its initial inventory took 68 minutes with a 10M chunk-size and 48 minutes with a 1M chunk-size, cutting the time by 25.6%. The total time to fully add all movies (outside of the scope of Rclone) with all movie metadata and such could be significantly shorter as well since artwork had progressed much further in a shorter amount of time.