Eventual consistency in the mirror (RAID 1) union after some upstreams were unavailable

What is the problem you are having with rclone?

Experimenting with the union backend. Trying to set up a mirror (RAID 1) union to have exact copy of files in all the upstream backends.

  • During original copy of a file into the union, 2 of 3 upstreams failed due to network/quota issues. Thus, the file appeared in one of the upstreams only.
  • The command rclone ls union-test: reports presence of the file in the union.
  • Copying the same file again to the union does nothing, i.e. the file does not appear on the now available upstreams.

The goal is to make all three upstreams eventually contain the same file, i.e. mirroring. Is it possible to achieve without a periodical rclone sync between upstreams or something like this?

I've read:

Run the command 'rclone version' and share the full output of the command.

rclone v1.62.2
- os/version: Microsoft Windows 10 Home 21H2 (64 bit)
- os/kernel: 10.0.19044.2728 Build 19044.2728.2728 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.20.2
- go/linking: static
- go/tags: cmount

Which cloud storage system are you using?

  • Mail Ru Cloud (intermittent network issue)
  • Yandex Disk (upload quota limit reached)
  • Koofr (successful)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy TestUnionSource\ union-test:

The rclone config contents with secrets removed.

[mailru]
type = mailru
user = XXX
pass = XXX
speedup_enable = false
token = XXX

[koofr]
type = koofr
provider = koofr
user = XXX
password = XXX

[ydisk]
type = yandex
client_id = XXX
client_secret = XXX
token = XXX

[union-test]
type = union
upstreams = mailru:union-test ydisk:union-test koofr:union-test
create_policy = all

A log from the command with the -vv flag

Second time, after the file was created on one of the upstreams.
2023/03/27 11:28:29 DEBUG : rclone: Version "v1.62.2" starting with parameters ["XXX\\rclone.exe" "copy" "XXX\\TestUnionSource\\" "union-test:" "-vv"]
2023/03/27 11:28:29 DEBUG : Creating backend with remote "XXX\\TestUnionSource\\"
2023/03/27 11:28:29 DEBUG : Using config file from "XXX\\rclone.conf"
2023/03/27 11:28:29 DEBUG : fs cache: renaming cache item "XXX\\TestUnionSource\\" to be canonical "//?/XXX/TestUnionSource"
2023/03/27 11:28:29 DEBUG : Creating backend with remote "union-test:"
2023/03/27 11:28:29 DEBUG : Creating backend with remote "koofr:union-test"
2023/03/27 11:28:29 DEBUG : Creating backend with remote "mailru:union-test"
2023/03/27 11:28:29 DEBUG : Creating backend with remote "ydisk:union-test"
2023/03/27 11:28:29 DEBUG : pacer: low level retry 1/10 (error Get "https://cloud.mail.ru/api/m1/file?access_token=XXX&home=union-test&limit=2147483647&offset=0": read tcp 192.168.1.4:54821->217.69.139.55:443: wsarecv: An existing connection was forcibly closed by the remote host.)
2023/03/27 11:28:29 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2023/03/27 11:28:30 DEBUG : pacer: low level retry 2/10 (error Get "https://cloud.mail.ru/api/m1/file?access_token=XXX&home=union-test&limit=2147483647&offset=0": read tcp 192.168.1.4:54822->217.69.139.55:443: wsarecv: An existing connection was forcibly closed by the remote host.)
2023/03/27 11:28:30 DEBUG : pacer: Rate limited, increasing sleep to 40ms
2023/03/27 11:28:31 DEBUG : pacer: Reducing sleep to 30ms
2023/03/27 11:28:31 DEBUG : union root '': actionPolicy = *policy.EpAll, createPolicy = *policy.All, searchPolicy = *policy.FF
2023/03/27 11:28:31 DEBUG : pacer: Reducing sleep to 22.5ms
2023/03/27 11:28:31 DEBUG : union root '': Waiting for checks to finish
2023/03/27 11:28:31 DEBUG : File.txt: Size and modification time the same (differ by -674.1µs, within tolerance 1s)
2023/03/27 11:28:31 DEBUG : File.txt: Unchanged skipping
2023/03/27 11:28:31 DEBUG : union root '': Waiting for transfers to finish
2023/03/27 11:28:31 INFO  : There was nothing to transfer
2023/03/27 11:28:31 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         2.6s

hello and welcome to the forum,

no --magic solution.
if you want rclone to sync files, need to run rclone sync,
if there were errors, then have to run rclone sync again.

depending on your use-case, can reduce the total mumber of files that need to be checked using filters, such as,
--max-age

1 Like

I might be misreading the end goal, but it is possible that any one or any pair of two could be missing any particular file, correct? If so then sync is probably not what you want as it can and will delete files as well.

If you babysit the process and known when a file is missing you can use sync, but if you were talking about having a process that runs every 'x' hours to copy any missed files, you cannot trust sync to do this unless you have a single source that is authoritative.

You could use a series of copy commands, something like 1 to 2, then 2 to 3, then 3 to 2, and 2 to 1 to ensure that a single file placed on any one backend is copied to the rest, but now you can't delete files unless all the backends are online and process the delete because any remaining copy would get re-replicated out again.

Another option might be to use union to write the file to one single backend (and read/delete across all backends), and then use bisync (noting the warnings about it not being production ready) to sync up 1 <--> 2, then 1 <--> 3, but in this case you would have a time window where files are only uploaded to a single backend. In theory union might be able to write to all and bisync detect the duplicates and figure it out, but especially given the newness of bisync I would not even attempt this myself at this point.

It's much simpler if you have one single "source of truth" (meaning that a file uploaded elsewhere can just be deleted) because you can use sync in this case.

Seems so. Even --magic-please doesn't work, I've tried…

Right, any of the upstreams can fail during file creation. This is not only my case, it could happen to anyone using the union backend with --union-create-policy all. I've raised this point to clarify limitations of the union applicability in the mirroring (RAID 1) scenario for myself and perhaps for the community, as current documentation doesn't describe this aspect.

My conclusion is that currently the multi-write union cannot be configured as RAID 1 on its own and is more suitable for other scenarios, such as RAID 0. I can imagine that it could be improved to cover the failed upstream behaviour by caching this information locally, for example. Though I'm not sure it's on the roadmap.

At least it might be beneficial to mention this caveat of --union-create-policy all in the documentation. I can contribute if the maintainers agree.

I tend to do so. In my case I can afford local backend which will be the source of truth, and then sync it to all other backends one-by-one. No union, sadly.

If you can trust your local backend, you could still use union to simultaneously write (and support reads in whatever way you prefer), with a periodic copy (or sync) to ensure the cloud targets are up to date.

It isn't perfect, but it would still be better-er than everything waiting for an eventual copy/sync.

Right! I didn't think about it.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.