Partial File Acces in Google Drive

whiteloader · November 30, 2019, 6:46pm

Hello,
I am relatively new to rclone and I am wondering how a file would be accessed if only part of it is needed?

Let's say I have a video file on google drive and I have it mounted in rclone and then play it trough plex.

As far as I understood the whole file will be downloaded as soon as any part of it is requested by the operation system or any programm. Is that correct? So does it mean I can only "jump" to another Part of the video after rclone has fully downloaded it?

Will the behavior change if I use the cache backend (I heard it's not recommended anymore) or crypt or VFS-cache?

ncw · November 30, 2019, 6:53pm

If you don't use --vfs-cache-mode full then rclone will only download the parts needed of the file.

What is your mount command line?

whiteloader · November 30, 2019, 7:28pm

Thank you for the fast reply! I am still in the conception phase, so I don't have a command line - I am trying to understand the system first.

First of all, thank you for confirming this is possible. I actually want to have the opposite for my other usecase.

What I am trying to do: I want to store my less popular torrent content on an encrypted Google drive and seed it long term. Judging from your answer I would assume that each requested piece makes an API Call to google and I would quickly get banned. I am actually doing it at the moment on my local computer with Google Filestream and get 403 banned by google multiple times per week.

What I came up with, was using the cache back end an set a very high limit for cache size like:

RCLONE_CACHE_CHUNK_TOTAL_SIZE = 1500G

This way I would create very little API Calls with google but only if the whole file was downloaded as soon as the first piece is requested and not all the pieces individually. Or at least the requests should be way bigger then the average piece size to download multiple pieces at once.

Can I control the minimum "request" size? So let's say that the server downloads at least 16 MB, even if just 1 MB of the file is requested by the torrent application? (Is that what RCLONE_CACHE_CHUNK_SIZE does?)

Is this chunk size always aligned at the start of the file?

Let's assume that I set it to 8 MB and now the torrent client tries to read the part between 14–15MB of an uncached file. Will rclone download (and store in cache?) the part

A) 14–15 MB
B) 8–16 MB
C) 14-22 MB
D) The whole file?

(as counted from the start of the file)

Thank you very much in advance!

Animosity022 · November 30, 2019, 9:03pm

Torrents and cloud storage as a very bad use case due to the nature of torrents seed and randomly get requests for data.

That being said, rclone handles the mount based on the application requesting to it. If you are using the cache backend, the buffer size should be 0M so it'll only request chunks of data based on the chunk size.

The torrent should should open the file and most likely seek to the proper spot and it would just grab what it needs.

You can validate the behavior with running the mount on debug mode as well.

whiteloader · December 3, 2019, 10:04pm

You are right and I really understand your concerns. Basically I am not trying to seed straight from my google drive, but rather seeding from the harddisk and using (or abusing?) the cache to automatically move my unused files to the archive and back if needed. Base on my (somewhat limited) local experiments I believe I won't actually have moving going on since many torrents that I seed are not requested for weeks or even longer. That depends of cause on the ratio of cache size and total seed volume

By setting the buffer I would prevent the additional workers from reading ahead fetching chunks that are not even requested (yet)?

thestigma · December 3, 2019, 10:21pm

When using the cache backend I think the buffer becomes largely irrelevant as it would only buffer data going from your cache to the OS - and this is already local fast data, so there is not much point in that. Might as well leave it at 0 then.

If not using a cache-backend the buffer will effectively try to "read ahead" up to the requested amount and try to keep itself full. The buffer is mostly a mechanism for smoothing out data-streams so they don't lose efficiency during spikes and dips.

The buffer will not affect worker-threads in any way.
Worker threads in cache-backend will work like this...
Cache registers that the OS last requested data from chunk3. It then assumes that the sequentially following chunks will soon be needed even though they have not been requested yet. It will try to grab these chunks to local cache ahead of time. The number of chunks "read ahead" like this in cache should be equal to the number of worker threads you have ( set via --cache-workers , which is 4 by default ). I do not think it is possible to control this in any way besides cutting down the worker threads, but since these are effectively your download threads also, having too few of them may potentially be detrimental to performance. This kind of functionality is kind of the point of the cache as-created (mostly intended for media streaming I believe). I do think there is a need for a more generalized and optimized cache system - and I think we will see that integrated into the VFS moving forward.

I have not tested this aspect very thoroughly, but I am at least certain that when it open a new file it will always start to fetch X chunks, where X is # of worker threads (based on experimentation during my study of the cache-backend). Whether this holds true at all points later I am not sure. I am not an expert on the cache-backend. You would probably need to look at the code on github.

system · March 2, 2020, 10:21pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.