When considering the Function / Category and path preservation policies, I'm having trouble deciding on the best course of action. My upstreams only support the Used quota (Jottacloud). Here are my specs and what I think are the proper policies, but I need a sanity check.
Action commands that would incur ingress and egress (ie. move, copy, etc) need to remain on the remote where the file already exists. No need to move a file from one remote to another if the remote supports server side operations (which i actually can't find a definite answer in the docs. However, action items on directories would need to consider the path of all remotes, as I don't want to leave orphaned files on other remotes. At first, i though I should use epff, but then I remembered the orphaned files/folders. Would epall be the right choice? If so, is the default action to keep the files on the existing remote?
Create commands would use the remote with the most free space, or conversely least used space, to balance the storage evenly across all remotes simply by size. I think this would be lus since Used is the only quota information provided by the remote.
Search, very similar to action, would be fine with ff for files, but directories could exist in all remotes, so would it be best to do epall? The idea being if I did an ls on a directory (take union:folder from the above example), would the return be file1 and file2?
Run the command 'rclone version' and share the full output of the command.
The best way is to test. I am saying it based on my own experience with union. Maybe it is me but policies description is not 100% clear. And I only discovered what I am doing wrong when trying them.
Your policies epall, lus, ff look correct for what you described you want to achieve. But the best way to check is to configure empty union and use it with some limited amount of data for test period. Try all edge cases you can think of. Do not use encryption for testing - much easier and faster to look at individual union members and see what is where.
Is there a way to "rebalance" the union? For example, I add new remotes to the union, i want to rebalance existing data, so all new data gets spread out evenly over the remotes. This is because new data has a higher likelihood of access vs old data and could potentially be a bottleneck. In my scenario, the new remotes would have lus policy and get written there until all remotes have the same used space.
@ncw mergerfs uses this in the balance tool, which basically uses rsync under the hood.
Is there anything in rclone that would accomplish this (using rclone obviously). I didn't find any related commands in rclone.
If there is not a built in tool, I suspect I could use a series of scripts and includes/excludes to do this, but that gets quite complicated really fast.
as a result I will have four remotes storing 300G each - effectively perfectly balanced union.
This is very straightforward for union using non path preserving creation policies (like your lus). With path preserving it will be a bit more complicated but overall idea remains the same.
This is a good one. Didn't think of this. I'm going to add it to my script. I'll report back when i have something (since in my case, I have 3 remotes to add to the union). Very good idea.
@kapitainsky something like this (hacked up in bash):
#!/bin/bash
# Specify original and new remotes
original_remotes=("remote1:" "remote2:")
new_remotes=("remote3:" "remote4:")
# Calculate total size across original remotes
total_size=0
for remote in "${original_remotes[@]}"; do
total_size=$(( total_size + $(rclone size "$remote" --json | jq '.bytes') ))
done
# Calculate the total number of remotes (both original and new)
all_remotes=("${original_remotes[@]}" "${new_remotes[@]}")
total_remotes=${#all_remotes[@]}
# Calculate the quota for each remote
quota_per_remote=$(( total_size / total_remotes ))
# Distribute files to the new remotes based on the transfer limit
for dest_remote in "${new_remotes[@]}"; do
for source_remote in "${original_remotes[@]}"; do
current_size=$(rclone size "$source_remote" --json | jq '.bytes')
transfer_limit=$(( current_size - quota_per_remote ))
# Only proceed if there's data to transfer
if (( transfer_limit > 0 )); then
rclone move "$source_remote" "$dest_remote" --max-transfer=$transfer_limit
fi
done
done
No suffix would be applied, since the json returned bytes. Hopefully, --max-transfer should assume bytes, but the docs don't say it explicitly. Can anyone confirm if supplying a byte integer will work (ie 100000000000 instead of 100G)
--max-transfer SizeSuffix Maximum size of data to transfer (default off)
Options which use SIZE use KiB (multiples of 1024 bytes) by default. However, a suffix of B for Byte, K for KiB, M for MiB, G for GiB, T for TiB and P for PiB may be used. These are the binary units, e.g. 1, 2^10, 2^20, 2^30 respectively.
Ah, fantastic. I'll give this a go when my current migration is done (my first sync attempt to exit google drive before my deadline to reduce storage).
I'm trying to wrap my mind around this. TBQH, im very new to the purpose and nuances of the policies, let alone the differences in preserving vs non-preserving.
Well, hang on, correct me if I'm wrong, but when interacting with the remotes directly, the union policies don't matter, right? So i move the files/directory from remote1 to remote4. The union would still see the file in the same place right?
If not, what if instead of moving from remote1 to remote4, move from remote1 to union. Then let the union figure it out. I'm sure im wrong here but i'm trying to figure it out.
difference is that with path preserving policies you can not have two remotes with the same path. So if you decide to move directory1 from U1 to U2 you have to move all its content - all files.
To be honest I am not sure myself what is real life use of path preserving policies - maybe my imagination is limited.
I think for most people using union on rclone, they would not have path preserving policies. If they did, I suppose they should be using the Combine backend instead.