Help setting up rclone gdrive gcatche gcrypt with seedbox

vrt1990 · November 27, 2019, 2:08pm

Hi,
I have tried setting up rclone remote gdrive gcatche gcrypt in ssh for seedbox.
I have searched this forum but most on the discussion involved plex which i dont want to use as of now.
Created gdrive gcatche gcrypt on ssh
After creating these three remote I have tried copying from seedbox filemanager to crypt folder on gdrive as encrypted, It didint work.

Then I've deleted all 3 remotes
And created only two remotes gdrive gcrypt
In this case copying from seedbox filemanager to crypt folder in gdrive as encrypted files,
This worked.
After copying , I wanted to mount these encrypted files on my mac as unencrypted files
So, I've installed rclone on my mac
From here i didnt understand what to do now.

for reference

the first 3 remotes i created in ssh seedbox

gdrive]

type = drive

client_id = xxx

client_secret = xxx

scope = drive

token = xxx

[gcatche]

type = cache

remote = gdrive:/gdrive

chunk_size = 5M

info_age = 1d

chunk_total_size = 10G

gcrypt]

type = crypt

remote = gcache:/crypt

filename_encryption = standard

directory_name_encryption = true

password = *** XXX ***

password2 = *** XXX ***

Since the above remotes created didnt copy from seedbox to crypt folder in gdrive
I deleted them and created

gdrive]

type = drive

client_id = xxx

client_secret = xxx

scope = drive

token = xxx

remote = gdrive:/crypt

filename_encryption = standard

directory_name_encryption = true

password = *** XXX ***

password2 = *** XXX ***

the above created pattern made me able to copy files from seedbox to crypt folder in google drive as encrypted files.

Spent two days trying multiple configs with none working for me as my knowledge on this limited

What i want to setup is copy files from seedbox filemanager to crypt folder in google drive and then access the google drive crypt files via mounting on my mac.

Thanking you in advance

thestigma · November 27, 2019, 10:44pm

Is this copypasted directly? (aside from what you have redacted) ?

Because your first example with 3 remotes looks about right, but it does contain some typos like the remote-name "gcatche"

The second example where you are making some kind of hybrid is all wrong. Or at least, this is not anywhere close to what is intended and I can not predict what that will do.

In short:
You need 3 remotes. A gdrive, a cache and a crypt (although the cache is very much optional).

These need to be chained together like this: OS --> crypt --> cache --> gdrive --> SERVER.
You do this by pointing the "remote =" from each to the next. So in the crypt it should say "remote=gcache:" for example. It looks like you got the girst of this in the first example except maybe typos. If you add additional subfolders or not is up to you.

Then with this setup, it matters which one you use. For example, if you are mounting this then you probably want to mount "gcrypt" because this will use all 3. If you mounted "gdrive" instead then it would be skipping the crypt and cache... so don't be confused by that.

Also it sounds like you perhaps are misunderstanding and trying to copy directly to the crypt folder? This won't work as it will effectively skip all layers (ie. not using rclone at all). You need to send the data via rclone (in through the crypt remote). This can be done in one of 3 ways:

via the commandline
rclone move /home/stigma/torrents gcrypt: (or something along those lines)
via a mount, ie simulating that gcrypt is a folder on the local filesystem (see documentation for "mount" or ask me about details). This is kind of simular to the regular mount command on Linux except rclone allows you to mount rclone remotes.
or via the rclone webGUI (the webGUI is not enabled by default however).
Each of these have their own manual pageson rclone.org if you want to read up on them.

If you are new to this I would suggest the mount strategy as it is the most intuitive, and with the right settings in your client (using a "temp download folder" option if you have one) you can effectively make the downloaded files go directly into crypted gdrive storage without any more fuss... This is basically what I do, with the only exception that I do it locally rather than on a VPS (which is really no different).

Personal opinion: I would start off without using the cache backend. Just gdrive + crypt. If you want to cache stuff we can use the VFS-cache for that instead. I see no reason you actually need the cache layer here and it will just add unnecessary complications. Besides - it is easy to add later if you change your mind.

Get started with this an show me what config you come up with, or ask more about details you aren't sure about. Once we have the basics working I can suggest additional details and optimizations.

vrt1990 · November 28, 2019, 8:46am

Thank you for the reply
Yes this is copy pasted except the redacted.
gcatche is typing error on my part.
I was hesitant to edit the gcatche to gcache so i waited for reply here.

" These need to be chained together like this: OS --> crypt --> cache --> gdrive --> SERVER."

I didnt understand this
what do OS and server stand for here.

my copy command in terminal for copying from seedbox folder to google drive is

rclone copy ~/folder/folder gcrypt:/

So i've tried what you said and did the following config

[gdrive]

type = drive

client_id = xxx
client_secret = xxx

scope = drive

token = xxx

[gcache]

type = cache

remote = gdrive:

chunk_size = 5M

info_age = 1d

chunk_total_size = 10G

[gcrypt]

type = crypt

remote = gcache:/cryptfolder

filename_encryption = standard

directory_name_encryption = true

password = *** ENCRYPTED ***

password2 = *** ENCRYPTED ***

this was the setup i tried after reading your post
here cryptfolder is the folder i created in google drive

after creating this setup i tried copying from seedbox folder to google drive using the command
rclone copy ~/folder/folder gcrypt:/

then i got th following message

2019/11/28 09:37:53 ERROR : /folder/folder/.cache/rclone/cache-backend/gcache.db: Error opening storage cache. Is there another rclone running on the same remote? failed to open a cache connection to "/folder/folder/.cache/rclone/cache-backend/gcache.db": timeout
2019/11/28 09:37:55 ERROR : /folder/folder/.cache/rclone/cache-backend/gcache.db: Error opening storage cache. Is there another rclone running on the same remote? failed to open a cache connection to "/folder/folder/.cache/rclone/cache-backend/gcache.db": timeout
2019/11/28 09:37:55 Failed to create file system for "gcrypt:/": failed to make remote gcache:"/cryptfolder" to wrap: failed to start cache db: failed to open a cache connection to "/folder/folder/.cache/rclone/cache-backend/gcache.db": timeout

after reading the above i checked how to see another rclone running by searching in the forum for similar scenarios and i got this command in one of the search results here

ps -ef | grep rclone

i used this command and got this

folder 1508 1448 0 09:33 pts/39 00:00:00 grep --color=auto rclone
folder 13351 1 0 Nov27 ? 00:00:13 rclone mount gcrypt: /folder/folder/mnt/gdrive --allow-non-empty --cache-db-purge --buffer-size 64M --dir-cache-time 72h --drive-chunk-size 16M --timeout 1h --vfs-cache-mode minimal --vfs-read-chunk-size 128M --vfs-read-chunk-size-limit 1G

Clueless from here.

vrt1990 · November 28, 2019, 9:13am

Forget to ask one more thing
The total setup i goes like this
remotes on ssh seedbox
gdrive
gcache
gcrypt

then the remote created on my mac to access google drive

gdrive
gcrypt

the mount command i used

rclone mount gdrive: /folder/folder/folder/GDecrypt

If my understanding is correct if the above command is used then mounted drive should show encrypted files

then if is use
rclone mount gcrypt: /folder/folder/folder/GDecrypt

then decrypted files will be shown right

So can i remove or rename files on local mount without any problem whatsoever

I dont see an unmount command here other than unmounting the google drive using mac right click and unmounting

vrt1990 · November 28, 2019, 11:36am

Ok i killed 13351 in the ssh seedbox
now i tried copy command

rclone copy ~/downloads/rseedhost gcrypt:/

where gcrypt has remote as gcache:/cryptfolder

files seems to be copying at 30 Mbytes/s from seedbox to google drive

now
if gcrypt has remote as gdrive:/cryptfolder

will they copy to gdrive faster at say 80 to 100 Mbytes/s
.

And after copying i needed to close connection to ssh seedbox to edit files in gdrive.
Is that normal

Should the remotes i create on mac for mount have the same credentials as remotes i created on ssh seedbox

When i mounted using the command

rclone mount gcrypt: /folder/folder/folder/GDecrypt

I was able to mount it and see the files decrypted
But removing files and editing file names on the mount was very slow and finder froze for few minutes.
Is this normal.

Help appreciated regarding the uses of gcache , commands for mounting and unmounting the google drive, editing files on the mount, playing video files from the mount.

vrt1990 · November 28, 2019, 1:13pm

After checking out many configs
finally what i did now

on ssh seedbox

[gdrive]

type = drive

client_id = xxx

client_secret = xxx

scope = drive

token = XXX

[gcrypt]

type = crypt

remote = gdrive:/cryptfolder

filename_encryption = standard

directory_name_encryption = true

password = xxx

password2 = xxx

Then on my mac

[gdrive]

type = drive

client_id = xxx

client_secret = xxx

scope = drive

token = XXX

[gcrypt]

type = crypt

remote = gdrive:/cryptfolder

filename_encryption = standard

directory_name_encryption = true

password = xxx

password2 = xxx

Is this ok for copying encrypted files to google drive and mounting them via rclone

thestigma · November 28, 2019, 5:48pm

In that case it's no wonder it doesn't work even if everything else was correct. Rclone will complain there exists no "gcatche" remote and exit with an error.

Sorry, maybe it was a mistake to include those if it just added confusion. the OS is the Operating System (Linux in this case I assume). Server is the Google Drive server in some data-center somewhere. We aren't going to do anything with these two. These are just there to illustrate how the traffic flows. Only the 3 layers in the middle are related to the rclone setup. You can imagine that when you do a copy command in Linux (or drag&drop in the GUI) it starts it's journey in "OS" and then goes up the chain step by step until it lands in the google server. The download traffic follows the same route in reverse. I hope that explanation made it easier to understand? Understanding how the traffic flows throuhg the layers is of great help to actually understanding what rclone does rather than just following a pattern someone recommended and crossing your fingers hoping it will work.

That looks correct to me, except that I am not sure what ~ signifies. I am not a Linux expert so maybe this is just some normal part of syntax I am not aware of. Aside from that it is the same as I would write.

In term of how these are linked together - this is correct.

This error specifically occurs when more than 1 rclone instance tries to use the same cache-remote. (the cache backend uses a database that requires exclusive access so this is not possible to do).
I expect that you are not in fact trying to run multiple instances - so almost certainly there is some old rclone process here that is still running and locking down the cache. We are going to have to terminate that process.

You probably know how to do this better than me, but here is how I'd probably do it:
run htop
select any processes that are named anything with rclone and terminate them with F9+ENTER. Exit with F10 when done. After that, try your mount command again.

If your process list it too long to go through manually, you could try

pkill -f rclone

This should pull the plug on any processes that has "rclone" anywhere in the name

thestigma · November 28, 2019, 6:10pm

Let us now worry about performance yet. I will help you maximize performance after you have the setup you want. But as I said earlier though, I would personally skip the cache layer as I see no reason why you would need it here. If you want to have some local caching to help speed up seeding then just remind me to address how to do this via the VFS cache (not the same cache system) later after everything else works.

No, this should not be needed. Only finished files should show up in gdrive, so any files you see there should immediately be editable. I don't see how the SSH connection to the seedbox should have any impact on this at all.

They need to use the same crypt-keys in their crypt-remote to be able to decrypt the fles. But aside from that - no. You can use any credentials you wish, and aside from the crypt remote the rest of the configuration does not need to be the same, I would suggest you simply go through rclone config and authenticate again on the Mac just like you did initially for the seedbox.

It's normal for there to be some latency compared to a normal drive of course as requests need to be send via the internet, but we are talking about less than a second typically. If that is too slow for you it is possible to precache the listings on the drive to make this respond much faster, but that is more of an advanced technique. If you want to look into that I recommend you remind me at the end when everything else is as you want it. Note that I have no experience with how OSX generally responds to rclone. I would just assume it would be similar to Windows and Linux.

If you want to edit files and/or have applications write files directly to Gdrive then I highly recommend you set these flags in your mount command (assuming that you want to use mount that is):
--vfs-cache-mode writes
--cache-dir /some/folder/on/the/local/computer
The first one is basically required for doing more than simple, direct file transfers (such as editing or letting programs write to the drive by themselves). Not required for renaming and deleting specifically though, but just use this and you will have no restrictions to worry about..
The second lets you choose the cache folder on your own. It is not required, but without it it will just use the default location. You may ideally want to set this folder to some sort of storage-drive rather than the OS-drive if you have more than one harddrive.

Here is a general example of how a mount command might look like
rclone mount gcrypt: /any/local/folder --vfs-cache-mode writes --cache-dir /some/folder/on/the/local/computer
How to unmount that directory:

umount /any/local/folder

see documentation:

thestigma · November 28, 2019, 6:12pm

Sure, this is perfectly fine. It doesn't use the cache-backend obviously, but currently I don't see a reason why you'd need it on your setup. If we find out you actually need it we can just add it easily later anyway.
I would just go with this for now.

We are working a little scattershot here. I am trying to answer all your questions, but we are touching on so many points at the same time I think there is a risk of confusion and information overload.

Might I suggest that we try to do one topic/problem at a time and finish up on it before we move to the next? Because right now I don't really have a good overview of where you are in the process - what is already working for you, what isn't ect.

Just to be clear - we are trying to set up 2 systems here right? 1 VPN seedbox + local Mac ?
What is the primary use-cases for your Mac? General use, or are you attempting to set up some kind of Plex setup or something like that ?

vrt1990 · November 28, 2019, 10:23pm

Thank you for the detailed response.
This cleared out so many questions for me
Especially the traffic follow route while uploading and downloading

Ok regarding the setup i created and working now

ssh seedbox
gdrive
gcrpyt

mac
gdrive
gcrypt

with gcrypt remotes created using same passwords

I tried using the copy command and was able to copy around 600 gb from seedbox folder to the google drive folder in encrypted form
This took aroung 2 hours with a transfer speed of 70 MBytes.
this is normal i think

Now i was able to mount the gcrypt folder on my mac using the mount command
Everything works ok till here

Then when i tried to rename mutiple files at the same time , the response was sluggish.
probably depends on the internet connection as you said.

I have tried GUI and that works fine too

What i wanted to achieve by doing this was save storage space on my mac and move all the downoaded files from seedbox folder to google drive in encrypted form.
And when needed I can mount the decrypted google drive and downloaded the needed files from google drive.

This was all i had in mind

When i search multiple discussion forums most people made a gcache remote
I didn't understand why this was needed and any extra benefits it would bring to my current setup.

Never used plex before so i did not understand the point of creating gcache.

And is there any way i can schedule copying/moving from seedbox folder to google drive crypt folder.
If there is a way should i keep my mac terminal open when the scheduled time for copying/moving starts.

Thanks in advance.

thestigma · November 29, 2019, 2:01am

If your VPN has more handwidth and you are not transferring small files you can get way more than that - Google Drive is good for about 42-45MB/sec pr transfer, over multiple transfers.

It's not "bad" by any means, but you may get even faster results with these flags if your VPN can keep up with it:
--drive-chunk-size 64M
--transfers 5

Note that this setup can use 5x64 = 320MB RAM for chunking. Make sure your VPN actually has enough RAM for this, or adjust down the numbers to make it fit within your constraints. I would recommend:

No more than 5 transfers, but no less than 4
As high chunk-size as possible, up to 128M. 64M is a pretty ideal compromise between speed and not using too much RAM. More help, but each doubling helps less and less, so no need to go overboard.

Ok, that is abut as easy as it gets then. Note that you can actually watch media straight from the gdrive however, and also even run small program directly from it if you want (although this requires you set --file-perms 0777 , for executing programs that is. No special setting needed for streaming).

Regarding renaming folders - if they respond within about a second or so I'd consider that "normal". There is some latency on a Cloud-drive after all.

The primary benefits of a cache-backend is:

It provides a way to do read-caching (the VFS-cache that mounts use currently can only perform write-caching)
It allows for chunking and prefetching ahead in files. Some people find this useful in dealing with tricky scenarios involving streaming of media. However, given a good setup it should not usualy be required. I can stream 4K videos just fine with no cache-backend. Neither is it required for Plex. It is much better to set up Plex correctly than to try to put a bandaid on the problem with adding an overly agressive cache. The most experienced Plex-users here typically do not actually use the cache backend.
It is also worth mentioning that the cache-backend is kind of on it's way out. The original author left a while back and it is not being maintained. The VFS-cache is likely to make it redundant in the future as it takes over the functions. The VFS-cache is maintained by NCW, the main author and has a lot of improvements coming.

There are many ways...

I would say the best way when it comes to torrents is to let the torrent-client itself trigger the move.
A lot of torrent-clients can support an option called something like "temporary download folder". This means that you can set a local folder where torrents download to - and then when it is done the torrent-client automatically moves it to the final destination (which can be a folder on the Gcrypt mount). Then the whole process is automated and there isn't a lot more you need to think about. This is the strategy I use - albeit on a local computer, as I have no seedbox (I'm jelly...)
I use Qbittorrent myself, but I would say that the majority of advanced clients have a feature like this.

Those that do not usually have a function that can trigger a script on completing. This can also be used to much the same effect (see below)

Alternatively, if your torrent-client has no such option and it is out of the question to use an alternative one, we can do this via recurring script. It can run every X hrs and move over files. I can provide bash(Linux) or batch(Windows) code that does this, and then you just schedue the task to run every so-and-so from either cron(Linux) or task-scheduler(Windows). Not sure what tool is used on a Mac - but there is guaranteed to be one... (does Mac also use bash script??)

There is no need for this. The seedbox can take care of it self.
I mean, you could trigger it from your Mac at home, but why would you want that inconvenience? ...

thestigma · November 29, 2019, 2:16am

Oh, and I think this will be quite important to set on your seedbox also:

--vfs-cache-max-age 332880h
--vfs-cache-max-size 100G
(numbers here for illustration - not necessarily what you should use)

The first one sets an absolutel limit for how long time keep a file in the VFS-cache (again, do not confuse this with the cache-backend. The VFS-cache is part of mount when mode writes is enabled).
Here I just set it to a really high number in order to effectively disable this limit. I think the default is 60m (minutes)

The second flag sets a max-size for the cache. Once the cache goes above this size it will remove the last-accessed files (ie. probably files that aren't being seeded much anymore). it is important to set some limit here because I think this limit is disabled by default - and we don't want your VPN to run of out disk-space in a few weeks and have you scratching you head wondering why...
Set the number to be whatever you can reasonably afford to spend of local storage.

I already mentioned --cache-dir earlier for choosing the location of the cache. It is not needed, but it may make it easier for you to organize if you use a more accessible location - or if your system has both a slow storage disk and a fast SSD disk for the OS you probably want the cache to be on the slower/larger disk.

It is likely beneficial for you to keep a reasonably-sized VFS write-cache. That way "hot" torrents get to stay on the seedbox locally for a little while when they are most used. Older torrents that eventually get thrown out of the cache can still be seeded - but since the seedbox will have to fetch the data from Gcrypt mount it may be a little slower. Having some cache will drastically cut down on the ingest network traffic needed, and also help the seedbox serve hot files at optimal speeds.

We are sure progressing fast into the more advanced stuff here, but I am happy to see you absorbing the information so quickly. That makes it much more fun to help

vrt1990 · November 29, 2019, 11:30am

I have set the default chunk size
Regarding transfers , 3 files were being being transferred at a time from seedbox folder to google drive

Regarding RAM for the seedbox I have to check the specs.

I tried playing a 1080p 6000kbps file from the grycpt mount and it worked fine.
But when i fast forwarded it to a specific timeline, there was momentary gap and that is to be expected considering this is a cloud drive i guess.

How i set it up on the seedbox was to move the torrents completed from default torrents download folder to a completion folder.

According to what you said here i can set the torrent application move torrents from the default downloading folder to google drive crypt folder.

If this can be done i have few doubts here

Google drive daily upload limit is around 750gb so I dont know when will this limit be over if
it is set to move completed torrents to google drive crypt folder automatically.

After moving the completed torrents will still be in the torrent client actively seeding from google drive . Will there be no problem then.
Or
Should i set it in such a way that on completion the torrents enter into finished mode rather than seeding mode. I couldn't find such setting in the torrent client.

Macos terminal uses bash script.

thestigma:

--vfs-cache-max-age 332880h
--vfs-cache-max-size 100G
(numbers here for illustration - not necessarily what you should use)

The first one sets an absolutel limit for how long time keep a file in the VFS-cache (again, do not confuse this with the cache-backend. The VFS-cache is part of mount when mode writes is enabled).
Here I just set it to a really high number in order to effectively disable this limit. I think the default is 60m (minutes)

The second flag sets a max-size for the cache. Once the cache goes above this size it will remove the last-accessed files (ie. probably files that aren't being seeded much anymore). it is important to set some limit here because I think this limit is disabled by default - and we don't want your VPN to run of out disk-space in a few weeks and have you scratching you head wondering why...
Set the number to be whatever you can reasonably afford to spend of local storage.

Where do i add these commands

If added where will vfs cache store the cached files then
Is it in the hard disk provided by the seedbox or
prefetched in the google cloud drive.

This wouldn't apply in this case because it is not a fusion drive.

Thank you for detailed responses.
Was able get most of the info given .

thestigma · November 29, 2019, 5:30pm

Yes that is unavoidable. Seeking should generally be faster than opening a new stream though. Opening might take a few seconds. Seeking might take between half to a full second. If your results are somewhat in line with that then this is functioning normally.

Yes, this is what I meant. This is generally the smoothest way to do it.
If you hit the 750GB/day quota, it will reset within 24 hrs (at a set time - usually sometime during the night, although the exact hour seems to depend on the server you are talking to).
So if this happens occasionally it's no big problem. The worst that could happen is that some files are not transfered becaue rclone gives up on them, but you could avoid this by setting --retries 99999 . Then rclone would just keep trying until the quota eventually resets.
Triggering the upload quota is not a "ban" and it doesn't make Google angry at you. It's just your daily allowance. It will also not interfere with other functions like downloading. You just can't upload any more until it resets.

You can seed directly from Gdrive. That works fine. I do that - although I have a somewhat limited upload volume for various non-technical reasons. You have a 10TB/day download quota, so nothing to worry about there. It might be slightly slower than local storage on very high speed torrents (not really due to bandwith but rather because there will be a slightly latency on fetching each chunk), but if you set the cache like I suggested all your really hot torrents will be in the cache anyway.

I can't answer any spesific torrent-client settings. Both because I haven't used anything outside of Qbittorrent for a long while - and because you haven't even told me which client you use... google is your friend on this one.

You add them to the rclone mount command. Especially on the seedbox, as that needs to run automated and you probably don't want to have to manually clean up. The default settings have no max-size, but it does have a max-time of 1 hour (which is too low IMO). Much better no have no time-limit but a max-size limit like I suggest here. Then you get some good performance benefits out of it + much less unnecessary fetching of hot files from Gdrive.

The cache is on local storage. That is kind of the point.
If not otherwise specified rclone will use a default directory somewhere in the users home directory. If you need to find the exact path you can look at the documentation for mount.

If you want to control where the cache-directory is located (which I would probably recommend), you can use the aforementioned --cache-dir command.
For example:
--cache-dir "/home/stigma/rcloneVFScache"

The cache doesn't need to be huge, but any amount will help a lot. If you can have enough to fit a couple of your most recent torrents that would be ideal. Hopefully you should have ample storage for this once you have set torrents to transfer to Gdrive after they are done - as those will no longer have to eat up most of your local storage.

I don't understand what a "fusion drive" means... please explain. this flag simply points to a directory somewhere on local storage. Any local storage (or heck - even fast network storage). The only limitation is that it can't point to a cloud as that would defeat the point.

vrt1990 · November 29, 2019, 9:41pm

By fusion drive i meant the ones having a smaller ssd and larger hard disk in a single setup.

Thank you for making things clear.
As of now I'm fine with settings you have suggested.
Will try things out like this for few days and get back if any additional changes are required.

thestigma · November 29, 2019, 10:16pm

Ah. Well if the whole seedbox only has an HDD it would only matter in terms of organization/managability. It would end up on the same physical volume anyway.

Sure. Good luck!

It would perhaps be a good idea to show me your full rclone command (for both machines) once you've had some time to digest this and try it out. Then I can easily do a once-over and check that we haven't missed anything important. We have after all discussed a lot of different things here, and odds are that something or other will be forgotten or be misunderstood, so it's nice to have a final validation once you get a grasp on it

whiteloader · November 30, 2019, 6:24pm

Hello,
I am confused and I hope you can help me. This sentence seems to suggest that the VFS cache is only used for writing, but not for reading. However the rest of you answer hints otherwise.

thestigma · November 30, 2019, 6:55pm

Let me first define what read-cache and write-cache means.

A read-cache means that whenever you read a file from the cloud, it saves a copy of that file for a while locally, so that you can access it very fast if you use it again soon.

A write-cache does the same, except that it saves a local copy whenever you write something to the cloud, instead of when you read something.

But of course, both of them make it faster to read (ie. re-access) files you recently worked on. The difference is what triggers them to go into the cache in the first place (reading or writing from the cloud).

So yes - if you upload a file to the cloud and it gets stored in the VFS cache then you do read it back from VFScache and get all of the speed-benefits from this as long as it stays in cache.

The VFS-cache, which is currently used on mount whenever you enable --vfs-cache-mode minimal, writes or full - currently only supports write-caching. This is likely to be expanded relatively soon however as the VFS-cache is constantly improving. The VFS cache is authored by NCW, the main author and is integrated in a way that lets it be efficient with few if any drawbacks.

The cache-backend is a remote-layer (like crypt) and a completely different system. It currently only supports read-caching (despite what the optional --cache writes flag might make you think). The cache-backend has not been developed for a while as the author has gone MIA so it looks like it will be gradually become obsolete compared to the VFS cache. Personally I tend to avoid using it unless there is a very good reason as it adds some delays, inefficiencies, and there there remains some bugs that will probably never get fixed.

If you must have both read and write caching you can run both caches together to accomplish this. You do end up with two entirely separate cache systems though, so it's not ideal. There's nothing stopping the exact same file from getting cached in both places for example.

Does this answer your question? If not, feel free to ask for clarifications.

whiteloader · November 30, 2019, 7:37pm

Thanks for the clarification! I explained one of my usecase in this threat:

At the moment I am trying to understand how to control how much data is downloaded if only parts of the file are requested for read.

In this case I would like to download and store the whole file or at least bigger pieces, even if only tiny pieces are requested one after another to limit API Calls and I am trying to figure out if the cache backend is better for this or the vfs cache.

I hope you don't mind the cross post – I think in general I want to achieve the same thing as the OP, but I made the other threat because I am also interested to find out more about the caching mechanic and I hope somebody can clarify it for me.

thestigma · November 30, 2019, 11:45pm

Hmm, well the two caches work a bit differently here.

the cache-backend fetches and stores chunks, so it can cache partial files.

the VFScache does not chunk, so it will only operate on whole files. At least so far. I wouldn't be surprised if we can get chunking behaviour for partial-file caching for this too at some point, but I'm not aware of this being one of the "near future" priorities. It currently doesn't make sense as long as it's only caching writes anyway as you would never want to partially upload anything. So there would need to be read-support before this feature makes sense. I will suggest to NCW that we maybe do that right off the bat when the time for that comes.

This spesific question is pretty straightforward though. As long as you do not use VFS mode full you only download the parts of a file that you request. This is actually most efficient without any cache-backend because you can fetch exactly what you need, while the cache-backend grabs arbitrarily sized chunks and is probably always going to download more than strictly needed in a partial read.

As an example - if I have a huge RAR archive with quick-open information, I can open it via the remote and list the contents inside very quickly within a few seconds. I don't need to download the whole thing, just the index inside the archive (at the start of the file). If I then pick a single file from the archive to extract I can similarly only download that part of the data.

What is the spesific use-case this is tailored for? Or is it an attempt at general optimization? It will matter what you intend to achieve, so please describe this to me in as much detail as possible so I can suggest how to best optimize.

I can think of two ways to do more aggressive caching right off the bat that would be fairly practical and have some benefits, but they are slightly different in what they achieve. But before I explain them, let me just summarize real quick what the limitations we are trying to overcome are (assuming you use Gdrive):

Opening files are fairly slow
Seeking in files are pretty quick once the file is open (ie. jumping around to grab different bits of it, sort of like jumping ahead in a media file). A seek does require an API call however.
You can only open about 2-3 new files pr second (on writes, I think reads are a little more forgiving) (this is a backend limitation and not related to the API quota). This is the primary reason why more than 4-5 --transfers on Gdrive is rather pointless.
I rarely consider the API quota to be much of a problem outside of malfunctioning or badly behaving programs. Because you are almost certainly going to run into the above limitation long before you max out the 1000request pr 100seconds API quota.
Thus the strategy should be not primarily to limit API calls, but instead to limit the amount of file-open operations we need to do. Small files are your enemy!!

Using the cache-backend:
You can use the cache/fetch function in the RC to pre-fetch the first X chunks of all files, see the documentation for that here:

This would of course make sure that all the tiny files (that fit inside a singe chunk for example) are already in the cache and do not need to be opened from the cloud at all, assuming they haven't changed. You can also quick-access the data that resides in the first part of a larger files (this often includes various metadata so it could be very beneficial for searches and scans that do not limit themselves to basic attributes (size, name, modtime, createdtime).
To automate this you can just have a script that runs daily to refresh the cache and re-download the pieces that had changes in them since yesterday.

Using the VFScache:
This is what I use myself currently. I do a simple rclone sync Gdrive: C:\RcloneVFSCache --max-size 10M --fast-list (Windows example). Just because the VFScache is a write-cache doesn't mean we can't just add files to it as we see fit It's just not automatically included. This ensures that all the small files (that kill my performance) will be pre-cached to the VFS-cache (if they changed since last time). For my use this takes up about 150GB of storage but is more than 70.000 files out of 90.000 files. I probably have a lot more small files than most... Basically this ensures that all files that I have to fetch from cloud because they are not in cache are large enough to get good speed on and usually max out my 160Mbit connection. Since I have a good connection, downloading these larger files are fast enough that I don't really care. In my experience this + pre-listing makes a world of difference and eliminates almost all of the poor performance that comes from Gdrive's poor performance on small files. It effectively works more like I was using a premium Cloud without limits (like Wasabi or Backblaze).

Depending on your need, your local storage you can afford to use for cache and the ratio of large to small files your ideal size may differ to mine of course. Some manual tuning may be required to find the ideal number so you can cache many files, but not end up with an impractically large cache. Using rclone size --max-size XXM --fast-list allows you to check this pretty easily.

Like with the above example, you can run this operation daily to keep the cache fresh and ready.

Sorry for the answer being so long, but I think this includes a lot of good info relevant to your question.
I recommend you answer the "your intended use-case" question above if you want further advice. I have on purpose not covered details about implementation here in order to not make it even longer. Ask if needed