Apache + rclone mount -- How to avoid fetching whole files to autoindex

Joel_Peshkin · February 5, 2019, 4:33am

I’m fiddling with a configuration where I rclone mount a remote from Box (read-only) and expose certain directory trees via an apache webserver.

When Apache’s mod_autoindex looks at a directory, rather than just stat-ing the files (which would be fast), it seems to be doing the equivalent of doing the Linux “file” command and peeking at the magic number at the beginning of each file. That seems to cause rclone to retrieve the entire file from Box even though I should only need the first few bytes of it. Since there are dozens of files that are tens of megabytes each, the page takes a very long time to load.

I tried specifying --vfs-read-chunk-size 1M and --vfs-read-chunk-size-limit 200M and it didn’t seem to help. I see the same problem if I just run “find *” in that directory.

Does anyone know a way to either get an rclone mount to quickly fetch the start of each file without fetching the whole thing or to keep mod_autoindex from reading from the file itself?

ncw · February 5, 2019, 11:44am

This is caused by rclone's read-ahead buffering. If you set --buffer-size 0 then I think it will fix it.

You could also use rclone serve http and proxy it via apache which may (or may not!) work better.

I use caddy to serve beta.rclone.org from an rclone mount and and that doesn't do the reading the file trick.

Diniboy · February 5, 2019, 5:33pm

A little bit off, but what do you use as a storage backend for the mentioned beta subdomain?

Joel_Peshkin · February 5, 2019, 9:21pm

Hi Nick,

I got to the bottom of the problem. The default RHEL Apache installation was using mod_magic and looking at every file to determine MIME types. I took away the MimeMagicFile directive and that kept Apache from requesting file data from every directory it indexes.

ncw · February 6, 2019, 9:58am

Excellent Glad you got it sorted.

ncw · February 6, 2019, 10:01am

The storage is backed by a bucket on an Openstack cluster.

Joel_Peshkin · February 10, 2019, 1:30pm

And yet another vital optimization when serving up directory trees using mod_autoindex, Apache will default to checking for index.php, index.html, etc… before evgen attempting to list the files that are in the directory. So, it is important to specify
DirectoryIndex disabled
to avoid this behavior.

ncw · February 10, 2019, 8:38pm

Does it make a lot of difference? If apache is fetching the directory to look for the index, then it will be cached for the directory listing I would have thought.

Joel_Peshkin · February 11, 2019, 12:26pm

It is surprising and not obvious from Apache documentation but easy to understand from rclone’s log files.

If I have foo/one/file1, foo/two/file1 and foo/two/file2, when I try to index foo/, it looks for foo/one/index.php, then foo/one/index.html, then foo/two/index.php, then foo/two/index.html. If nobody has done this in a cache expiration time, this can take 60 seconds or longer to load the page on a decent size directory tree. With the directoryindex disabled, it doesn’t try these silly lookups and is essentially instantaneous.

Similarly, when a directoryindex finds files with unknown extensions, it defaults to trying to figure out their mime types.

With those two shut off, things are nice and quick.

ncw · February 11, 2019, 1:38pm

Ah, I see. Thanks for explaining