Need advice on rclone usage for decompressing images over network

I develop an image codec for jpeg 2000, and I would like to decompress large GIS images (100s of GB or more) stored in cloud storage. I just set up rclone
to map my remote service as a drive

rclone mount minio:grok ~/miniogrok --vfs-cache-mode full --cache-dir ~/miniogrok_cache

In the codec, I access files by memory mapping them and accessing data as a buffer.

The file access is sequential but the data is sparse - codec will often skip parts of the file.

I want to get optimal performance - any advice on how best to manage the cache in my mapping command ?

Also, in some cases, I want to prefetch part of the file before decompressing, or ideally prefetch while decompressing. Is this possible, and what is the best approach.

Many thanks for any advice or guidance !

Aaron

welcome to the forum,

can you post the output of rclone config minio: and rclone version ?

really hard to generalize a set of optimized flags.
much depends on internet connection, latency and your codec, applications using the codec, and so many other unkown factors not mentioned.

rclone mount simply random-access reads chunks of a file, as requested by an application.
then, if using the optional vfs file cache, saves the chunks for a period of time.
it is the your application that will control the prefetch before/while issue.

depends on how you define optimal ?
imho, create a performance test for the codec, run rclone mount with default values, establish baseline performance.
based on that, we can see what, if anything, needs to be tweaked.

and can check out my summary of the two types of vfs cache

1 Like

Thanks for the welcome.

rclone version

- os/version: ubuntu 24.04 (64 bit)
- os/kernel: 6.8.0-44-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.23.1
- go/linking: static
- go/tags: none

rclone config minio: just showed help message, maybe you meant something else ?

What is your advice on prefetch? ideally I could issue a prefetch for a given part of the file, and then start decompressing, and as data comes in, I make use of cache. But this may not be safe to do. I would issue the prefetch from another thread or process.

sorry, that should be rclone config redacted minio:

that is up to your application to ask the mount to read a chunk of the compressed GIS image file.

  1. application requests a sequence of bytes from a file.
  2. rclone checks if the bytes are already in the cache. if not, rclone downloads a chunk from minio into the cache.
  3. rclone returns the bytes to the application.

should not a problem for multiple threads/processes to have read-only access to the same file.
could use --read-only

[minio]
type = s3
provider = Minio
access_key_id = XXX
secret_access_key = XXX
endpoint = http://localhost:9000/
acl = private
### Double check the config for sensitive info before posting publicly

Thanks, I just added the --read-only flag to the mount command.

Ideally, I would have a bunch of threads reading and decompressing the file via memory mapping, and then have a few prefetch threads where, while the decompression is taking place, issue some targetted prefetch commands to prefetch data I know is going to be used. So, this is safe to do ?
There will not be any partial reads while prefetch is in progress ?
I guess in my case, for read only, either the data is there or it's not, it will not be in the cache and then change.

My concern is race conditions on the rclone vfs cache when two different threads are reading the same data from remote and accessing the same area of the cache.

for a moment, forget about rclone and minio.

how would the following be a problem?

  1. the entire image file is on local storage.
  2. two applications open the file for read-only random access.
  3. each application read bytes from from that file?

indeed, that would not be a problem at all.

Currently my minio instance is running in a docker instance locally, but this is just a test setup - the real scenario will be data on an S3 bucket.

So, there are two threads reading the data and rclone is writing to cache, so I want to make sure that this is safe to do - the rclone cache read/write is somehow protected with locks.

that post is from four years ago


if your application(s) open the file using the correct mode, and use correct values for rclone flags.
then i will bet rclone can handle your specific use-case.

let's see what other forum members think?
in the end, we need an expert to confirm?

1 Like

perhaps @ncw can comment on whether this is still an issue

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.