Union Configuration

What is the problem you are having with rclone?

When considering the Function / Category and path preservation policies, I'm having trouble deciding on the best course of action. My upstreams only support the Used quota (Jottacloud). Here are my specs and what I think are the proper policies, but I need a sanity check.

  1. Action commands that would incur ingress and egress (ie. move, copy, etc) need to remain on the remote where the file already exists. No need to move a file from one remote to another if the remote supports server side operations (which i actually can't find a definite answer in the docs. However, action items on directories would need to consider the path of all remotes, as I don't want to leave orphaned files on other remotes. At first, i though I should use epff, but then I remembered the orphaned files/folders. Would epall be the right choice? If so, is the default action to keep the files on the existing remote?

ie.

remote1:folder/file1
remote2:folder/file2
remote3:some/path

and rclone moveto union:folder union:directory (basically renaming folder to directory)

would hopefully result in

remote1:directory/file1
remote2:directory/file2
remote3:some/path
  1. Create commands would use the remote with the most free space, or conversely least used space, to balance the storage evenly across all remotes simply by size. I think this would be lus since Used is the only quota information provided by the remote.

  2. Search, very similar to action, would be fine with ff for files, but directories could exist in all remotes, so would it be best to do epall? The idea being if I did an ls on a directory (take union:folder from the above example), would the return be file1 and file2?

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1
- os/version: debian 10.11 (64 bit)
- os/kernel: 4.19.0-18-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Jottacloud (primarily) and local disk as a buffer for some directories

The command you were trying to run (eg rclone copy /tmp remote:tmp)

NA... yet

The rclone config contents with secrets removed.

3 identical remotes and crypts with the same crypt config (or a single crypt on the union, haven't decided yet).

[jotta01-connect]
type = jottacloud
configVersion = 1
client_id = jottacli
client_secret =
tokenURL = https://id.jottacloud.com/auth/realms/jottacloud/protocol/openid-connect/token
token = {"access_token":"tokenhere"}
device =
mountpoint =

[jotta01]
type = crypt
remote = jotta01-connect:
password = passwordhere

A log from the command with the -vv flag

NA... yet

The best way is to test. I am saying it based on my own experience with union. Maybe it is me but policies description is not 100% clear. And I only discovered what I am doing wrong when trying them.

Your policies epall, lus, ff look correct for what you described you want to achieve. But the best way to check is to configure empty union and use it with some limited amount of data for test period. Try all edge cases you can think of. Do not use encryption for testing - much easier and faster to look at individual union members and see what is where.

It is documented here. Look at Move feature.

You can also query your remote capabilities by running:

$ rclone backend features remote:

In addition if you decide to use crypt remote, Jottacloud supports base32768 encoding, you can test it by running (only in the latest v1.64 beta):

$ rclone test info --all jottacloud:InfoTest
2023/08/10 10:42:54 NOTICE: jottacloud root 'test1/test-base32768': 0 differences found
2023/08/10 10:42:54 NOTICE: jottacloud root 'test1/test-base32768': 1028 matching files
// jottacloud
stringNeedsEscaping = []rune{
	'/', '\x00'
}
maxFileLength = 255 // for 1 byte unicode characters
maxFileLength = 255 // for 2 byte unicode characters
maxFileLength = 255 // for 3 byte unicode characters
maxFileLength = 127 // for 4 byte unicode characters
canWriteUnnormalized = true
canReadUnnormalized   = true
canReadRenormalized   = true
canStream = false
base32768isOK = true // make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters
1 Like

Is there a way to "rebalance" the union? For example, I add new remotes to the union, i want to rebalance existing data, so all new data gets spread out evenly over the remotes. This is because new data has a higher likelihood of access vs old data and could potentially be a bottleneck. In my scenario, the new remotes would have lus policy and get written there until all remotes have the same used space.

@ncw mergerfs uses this in the balance tool, which basically uses rsync under the hood.

Is there anything in rclone that would accomplish this (using rclone obviously). I didn't find any related commands in rclone.

If there is not a built in tool, I suspect I could use a series of scripts and includes/excludes to do this, but that gets quite complicated really fast.

Nothing like that exists now as it would have to be built.

There is no rclone union re-balancing functionality.

It can be achieved though using simple manual method.

Let's say I have U1, U2 and U3 union members using 400GB each.

After adding a new empty U4 I can run:

rclone move U1: U4: --max-transfer 100G
rclone move U2: U4: --max-transfer 100G
rclone move U3: U4: --max-transfer 100G

as a result I will have four remotes storing 300G each - effectively perfectly balanced union.

This is very straightforward for union using non path preserving creation policies (like your lus). With path preserving it will be a bit more complicated but overall idea remains the same.

1 Like

This is a good one. Didn't think of this. I'm going to add it to my script. I'll report back when i have something (since in my case, I have 3 remotes to add to the union). Very good idea.

@kapitainsky something like this (hacked up in bash):

#!/bin/bash

# Specify original and new remotes
original_remotes=("remote1:" "remote2:")
new_remotes=("remote3:" "remote4:")

# Calculate total size across original remotes
total_size=0
for remote in "${original_remotes[@]}"; do
    total_size=$(( total_size + $(rclone size "$remote" --json | jq '.bytes') ))
done

# Calculate the total number of remotes (both original and new)
all_remotes=("${original_remotes[@]}" "${new_remotes[@]}")
total_remotes=${#all_remotes[@]}

# Calculate the quota for each remote
quota_per_remote=$(( total_size / total_remotes ))

# Distribute files to the new remotes based on the transfer limit
for dest_remote in "${new_remotes[@]}"; do
    for source_remote in "${original_remotes[@]}"; do
        current_size=$(rclone size "$source_remote" --json | jq '.bytes')
        transfer_limit=$(( current_size - quota_per_remote ))

        # Only proceed if there's data to transfer
        if (( transfer_limit > 0 )); then
            rclone move "$source_remote" "$dest_remote" --max-transfer=$transfer_limit
        fi
    done
done

No suffix would be applied, since the json returned bytes. Hopefully, --max-transfer should assume bytes, but the docs don't say it explicitly. Can anyone confirm if supplying a byte integer will work (ie 100000000000 instead of 100G)

--max-transfer SizeSuffix   Maximum size of data to transfer (default off)
1 Like

as per docs:

Size options

Options which use SIZE use KiB (multiples of 1024 bytes) by default. However, a suffix of B for Byte, K for KiB, M for MiB, G for GiB, T for TiB and P for PiB may be used. These are the binary units, e.g. 1, 2^10, 2^20, 2^30 respectively.

so in your case you need to add B suffix

1 Like

Ah, fantastic. I'll give this a go when my current migration is done (my first sync attempt to exit google drive before my deadline to reduce storage).

Ultimately, I'd like to try to write this in go and possibly open a PR for union command, ie. rclone unionbalance union:

the command would be able to derive the rest (source remotes, dest remotes, and quotas).

1 Like

It would be nice... but you can not use the same simple logic for path preserving creation policy union.

I'm trying to wrap my mind around this. TBQH, im very new to the purpose and nuances of the policies, let alone the differences in preserving vs non-preserving.

Well, hang on, correct me if I'm wrong, but when interacting with the remotes directly, the union policies don't matter, right? So i move the files/directory from remote1 to remote4. The union would still see the file in the same place right?

If not, what if instead of moving from remote1 to remote4, move from remote1 to union. Then let the union figure it out. I'm sure im wrong here but i'm trying to figure it out.

difference is that with path preserving policies you can not have two remotes with the same path. So if you decide to move directory1 from U1 to U2 you have to move all its content - all files.

To be honest I am not sure myself what is real life use of path preserving policies - maybe my imagination is limited.

I agree with this. Not that they are wrong, but I just don't understand the uses.

I think for most people using union on rclone, they would not have path preserving policies. If they did, I suppose they should be using the Combine backend instead.

1 Like

In pseudo code:

  1. Parse the union to get remotes
  2. Check policies and error if any are path preserving
  3. Determine used space for each remote
  4. Get max transfer for each remote
  5. Launch Rclone move sub processes with flags.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.