VFS Cache Overlap

What is the problem you are having with rclone?

On the rclone mount and rclone serve pages, it is pointed out that:

You should not run two copies of rclone using the same VFS cache with the same or overlapping remotes if using --vfs-cache-mode > off . This can potentially cause data corruption if you do. You can work around this by giving each rclone its own cache hierarchy with --cache-dir . You don't need to worry about this if the remotes in use don't overlap.

What counts an an "overlapping" remote in VFS caching?

For instance, if I had a remote file structure setup like so:

remote_folder
├── folder1
└── folder2

Then setup an rclone config file such that:

[remote_folder]
type = sftp
host = example.com
user = exampleuser
key_pem = <key>
shell_type = unix
md5sum_command = md5
sha1sum_command = sha1

[folder1]
type = alias
remote = remote_folder:folder1

[folder2]
type = alias
remote = remote_folder:folder2

And finally ran:

rclone mount folder1: X: --vfs-cache-mode full
rclone mount folder2: Z: --vfs-cache-mode full

in two separate terminals (thus creating two separately running rclone processes handling these two different folders on the same remote server)...

...would this pose a problem and/or introduce the potential for data corruption, or would this be fine to do (since X: and Z: refer to otherwise mutually exclusive directories, save for them sharing the same root/parent directory and remote)?

For what it's worth, I'm using the docker volume plugin (so these would technically be rclone serve docker, but I digress); my intent is to have some of my running docker containers write some of their persistent data to a remote SFTP server.

As a more real-world example, I am running SFTPGo in a docker container with its data directory (the directory that would hold documents I upload to the server, for example) being mounted to a docker volume run by rclone; SFTPGo writes the documents I upload to its data directory, which is then written to rclone's cache directory, and then uploaded to my SFTP remote some time later. (In effect, I take advantage of the speed of uploading to a local computer running SFTPGo, but more-or-less full advantage of the storage I pay for at rsync.net.)

I would then run something else (i.e., Paperless-ngx) and have its data directory persisted similar to the above container (just to a different directory on my remote; i.e., SFTPGo uploads to remote_folder/folder1 and Paperless-ngx uploads to remote_folder/folder2).

Is doing this "safe"? Or do I risk corrupting SFTPGo and/or Paperless-ngx's data by having them write to different folders in the same folder structure on the same remote?

Run the command 'rclone version' and share the full output of the command.

rclone v1.65.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-91-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

SFTP

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount; see above

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

N/A; see above.

A log from the command that you were trying to run with the -vv flag

N/A; see above.

If not sure - and frankly with aliases I would not be sure myself without testing - why not to be safe and explicitly specify cache locations?

rclone mount folder1: X: --vfs-cache-mode full --cache-dir /path/to/cache1
rclone mount folder2: Z: --vfs-cache-mode full --cache-dir /path/to/cache2

otherwise I suggest you run your mounts and inspect if separate caches are used or not. Add -vv (DEBUG) flag and at then you will see (at the beginning) lines like:

2024/01/15 09:45:56 DEBUG : vfs cache: root is "/Users/kptsky/Library/Caches/rclone"
2024/01/15 09:45:56 DEBUG : vfs cache: data root is "/Users/kptsky/Library/Caches/rclone/vfs/remote"
2024/01/15 09:45:56 DEBUG : vfs cache: metadata root is "/Users/kptsky/Library/Caches/rclone/vfsMeta/remote"

telling you where default cache is created.

And in your specific case why to use two mounts? Wouldn't be easier just:

rclone mount remote_folder: Z:

Since these are ultimately going to be docker volumes handled by rclone's docker plugin, I don't think I can easily specify my cache location. Consider the following docker-compose.yml file:

version: "3.3"

services:
  sftpgo:
    container_name: sftpgo
    image: "drakkan/sftpgo:v2.5.6-alpine"
    volumes:
      - "data:/srv/sftpgo"
      - type: bind
        source: ./sftpgo-home
        target: /var/lib/sftpgo
# environment vars, networks, etc. have been omitted

volumes:
  data:
    driver: rclone
    driver_opts:
      remote: "sftpgo_data"
      allow_other: "true"
      vfs_cache_mode: "full"
      vfs_cache_max_size: "10G"
      poll_interval: 0
      cache_dir: "/home/dockerrunner/docker/sftpgo/sftpgo-data-cache"

The very last line, cache_dir: "/home/dockerrunner/docker/sftpgo/sftpgo-data-cache" does not work, resulting in a Error response from daemon: create sftpgo_data: VolumeDriver.Create: unsupported backend option "cache_dir". Theoretically, this should match up with rclone mount/serve's --cache-dir option, but in practice seems to not work.

I suppose I could install the rclone plugin multiple times under different aliases (assuming docker supports this, which I haven't tried, though perhaps I will), but this seems inefficient.

Theoretically, yes; but again, these are ultimately docker volumes and not just rclone mount points. I could use rclone mount in it's "raw" form to handle this, but I then run into the problem that if my server restarts, I now have to start rclone before I can access my data (or kerfaffle with something to get rclone to mount on startup); ideally, docker would handle this so that I don't have to (and using rclone's docker plugin would handle this).

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.