How to optimize use of remotes in unions?

What is the problem you are having with rclone?

I do not have a problem. I am not asking for help with how to solve a problem with my existing configuration … exactly. Instead, I am planning on doing something and am asking how best to do it. See below the boilerplate for the actual question.

Run the command 'rclone version' and share the full output of the command.

rclone v1.73.2-termux
- os/version: unknown
- os/kernel: 6.1.163-android14-11-gca93bcec643f (aarch64)
- os/type: android
- os/arch: arm64 (ARMv8 compatible)
- go/version: go1.26.0
- go/linking: dynamic
- go/tags: noselfupdate

Which cloud storage system are you using? (eg Google Drive)

A Union.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync /home/user/data myunion: -order-by size,descending --transfers 1

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[remote1]
type = alias
remote = /home/user/mount1

[remote2]
type = alias
remote = /home/user/mount2

[remote3]
type = alias
remote = /home/user/mount3

[myunion]
type = union
upstreams = remote1: remote2: remote3:

log from the command that you were trying to run with the -vv flag

I haven't tried it yet.  How should I do it?  See question, below

Okay, so after all that, here is the question!


I have a union with a few remotes in it.
I'm looking for help on how to configure it and how to write the sync to optimize the use of space in the remotes.
I shall explain ...

Currently, the union is setup as:

action_policy = all
create_policy = mfs # <-- important
search_policy = ff

I also currently sync using: --order-by size,descending

This leaves me with free space scattered among the remotes, as it finds the emptiest remote and fills it first. And this eventually results in a situation where, even though I may have an excess of space collectively, there is not enough in any one remote for the next file that needs to be created. So, I have to add another remote to the union. This is frustrating, as it means the space is not being used optimally, wasting money.

What I would like to do is to have the space in the union optimized so that any free space in fuller remotes gets filled before the free space in the emptier remotes. (I do not want to serialize the filling. I want to optimize/maximize the filling.)

What this would do (again) is find the smallest contiguous space (think of the remotes in the union like disk sectors) that can hold the largest file, first. Create it. Then for the next file to be created (which should be next-to-largest), find the now-smallest contiguous space that can hold it. Etc. This should prevent having lots of unused space scattered among the remotes in the union.

So, if I change create_policy to be lfs to achieve this, what do I need to set --union-min-free-space to? (I have files as small as a few KB to as large as hundreds of MB.) Or are there other options I am missing that would help me do what I want to do?

(I also have not started asking the question of how to deal with files that grow within an already-full remote. This is rare for me, so lower priority.)

Thanks!

PS, I have read this:
Add minfreespace flag for creation policies from mergerfs · Issue #6071 · rclone/rclone

welcome to the forum,

--log-level=DEBUG --log-file=./rclonelog.txt
fwiw, delete the log before each run your rclone command


as for union, maybe @kapitainsky has a suggestion?

Thanks @asdffdsa, but I don’t know how to even try it yet. Once I get some info on how to do this, I will try it like you said.

Existing union implementation does not work like that.

Union member to write to is decided every --union-cache-time (default 120s) based on policies. So for example every 120s rclone runs rclone about and choose one with the most of the free space (if mfs policy is used). Then this remote is used until next free space check. It does not check space for every single file.

One way to optimise free space usage today is to add chunker overlay to the equation splitting big files into multiple smaller ones.

Let me know if you have more questions.

Thanks, @kapitainsky .

I see what you mean. I was presuming that the union backend was keeping up with what was being written during the sync and updating its local cache appropriately, so it would have a good estimate of what the current free space is.

Do you know the best way to decide what size chunks to use?

Here is a histogram showing the number of files I have by size (credit: https://superuser.com/a/1855375) :

  1k:     17 
  2k:      9 
  4k:     18 
  8k:     62 
 16k:    321 ----
 32k:   1132 ----------------
 64k:   1905 ---------------------------
128k:   2251 --------------------------------
256k:   1735 ------------------------
512k:   1968 ----------------------------
  1M:   2177 -------------------------------
  2M:   2304 --------------------------------
  4M:   4515 ----------------------------------------------------------------
  8M:   2773 ---------------------------------------
 16M:   1091 ---------------
 32M:   6952 ---------------------------------------------------------------------------------------------------
 64M:  11590 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
128M:   2285 --------------------------------
256M:    700 ----------
512M:     49 
  1G:      2 

Also, do you know if you can enable chunking on existing data (so that existing files in the union are treated normally, but new files are chunked), or do you have to migrate it?

There is no one fit all answer here. How much waste space do you accept on your remotes? Your chosen chunk size must be smaller. Myself for similar setup I use 128MiB and it works well. Do not forget to adjust minfreespace (it is 1GiB) by default. But if you make it too small then you risk that something breaks if more data is written during cache-time.

Then take into account your remotes’ characteristics. Do they handle a lot of small files well? Any hard limits? etc.

If you have enough of free space you can try this migration procedure:

I have to correct myself. --union-cache-time only applies to a path preserving policies. So in your case:

it is irrelevant.