How to optimize mount command for read only random access

at first, i fell in love with rclone, but rclone would only like me.
now, i love rclone and rclone is starting to love be back.

i am using rclone with the mount command on microsoft windows.
seems to work very well, very stable.

veeam backup and recovery has a feature called instant restore.
let's say my window server dies and i use veeam to backup it up.
normally, do use a full system restore can take many hours to copy all the data from the .vbk backup file to the new server.
basically, veeam, on the fly, create a new virtual machine using a previsuly created backup file .vbk
very cool technology. that virtual machine will only read the 'sector's from that .vbk that is needed to startup the virtual machine, as needed. as more 'sector's are needed, veeam will get them from the backup file and feed the sectors to the virtual machine. so that virtual server boots up quickly.

instant restore, designed to work over a lan, perhaps a wan with vpn but certanily not from a file in the cloud.
so what i have done, to use rclone mount to mount that backup file as local storage and then point veeam instant recovery to the 'local' file but it is very slow to boot as that virtual machine has to read each 'sector', from that .vbk file, which is in the cloud.

i wanted to ask as to how to tweak the mount command flag.

  1. the mount needs to be read-only, no need to write at all to the . vbk
  2. a lot of random access reads to the .vbk file.

thanks,
david

Homewrecker! >_<

Rclone operates purely on a file-level, so any operations that strictly require block-level can't work - like block level incremental backup solutions. You can achieve similar things at the file-level too (albeit a bit less efficiently) as you basically need to save the incremental changes as separate files rather than partially modifying an existing file, so this will be up to the software you use.

TLDR: Rclone can only operate on files. It can only write whole files. It can however read arbitrary segments of files without downloading it all - assuming your software requests it in this manner.

The basic problem here is that most cloud services have some significant latency on opening a read stream, and many of them also have limitations on number of file operations pr second - meaning they are often not great at "small random access". That's exactly what for example running a VM needs, so no wonder this is sluggish.

Your limitation here is very likely:

  • API burst quota of your service
  • file operations burst quota of your service
  • To a less extend the general latency of request to a cloud compared to even the worst HDD

I have my doubts that this can be directly fixed as the fundamental limitations are likely on the cloud-service level and not in rclone per se. You are also trying to optimize here for something that is one of the weakest point of cloud storage in general. But if you give me some more info I might be able to suggest some way alternative way of approaching the issue.

  • What is your cloud provider? (the restrictions can be very different between them)
  • Is some sort of write-caching solution on the table for you? (like keeping the latest revision of the most essential files available on LAN)

You can use the --read-only flag on mount for this.

This isn't rclone's sweet spot as @thestigma explained above.. It will work but be sloooooow.

You may be able to tune it by reducing --buffer to 0 to prevent over-reading. Tuning these might help too

 --vfs-read-chunk-size SizeSuffix         Read the source objects in chunks. (default 128M)
 --vfs-read-chunk-size-limit SizeSuffix   If greater than --vfs-read-chunk-size, double the chunk size after each chunk read, until the limit is reached. 'off' is unlimited. (default off)

I know that buffer will pre-read. Wasn't sure if it will read beyond the requested chunk though. Would be nice to know the specifics of that, because if so then a large buffer could be very crippling (and a lot of users here on the forum for some weird reason insist on using some crazy big buffer sizes...)

 --vfs-read-chunk-size

Can a high value here really result in over-read? After reading through some threads on this I was under the impression that the only thing this really did was increase the range of the request. I thought you could request 128M but just download 1MB anyway and not be penalized (except it might count the request against your 10TB daily quota - this is still not 100% clear to me).

Just to prevent any confusion for the OP, I believe the parameter is --buffer-size

Sorry for interjecting with selfishly motivated questions, but since this is also relevant for the OP here I will permit myself to pick your big beautiful brain :stuck_out_tongue:

thanks,
i want to better understand the info veeam is asking rclone mount for.

does rclone provide a way to log requests, so that i could figure out the size of the data requests?

if the file is 100MB and the chunk size=1MB, there were be 100 chunks, i want to know which chunk rclone is reading from the mounted file.

thanks,
david

It is the --buffer-size parameter which can result in over-reading... Though if you ask the cloud storage for 128M of data, it may start sending that to you even if you don't read it (the wonders of TCP buffers) so reducing the parameter may help.

If you use -vv you'll get plenty of debug info from the mount. If you want to see the actual request to the storage provider then use --dump headers too.

--log-file mylogfile.txt would be helpful here as the debug log will be too unmanageable to try to read in a terminal window. But yes, this wil tell you exactly what is going on.

Also be aware that by default the vfs chunk size is only the starting size and it will double for each extra contentious segment that it needs from that same file (unless seeked, which isn't relevant here). You could always cap the maximum it can grow to by setting a low chunk max limit though.

You may save a little bandwidth by all this, but all those tiny chunks will also add use a lot more API calls.

This would be still be helpful to know. Hard to suggest pinpoint optimizations without knowing what limits and quotas we are trying to work within.

hi,
provider is wasabi and as far as i know, there are no restrictions or ingress/egress and any other fees, like amazon and micro$oft charges. $5 per TB of data, no other fees.

and keeping latest revision of the most essential files on the lan is not an option.

thanks

I'm not well versed on the details of Wasabi limits, but I think it is generally much more permissive than many others - which is definitely good in this case. NCW likely knows more here as most I know about it is what I've heard him mention in passing.

I at least know it can handle a lot of concurrent transfers, so I think you can safely uncap the default of 4 to like 20 or 30 in case that might help - but that would depend on the requesting software requesting files in parallel (which it probably does it at least some extent).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.