Advantage of new union remote?

BinsonBuzz · November 4, 2018, 10:13pm

Took me a while to work this out as well - I think this is to replace the offline writes capability of rclone cache i.e. 'commit' defines the offline store that is moved to location-0 after 1H

I think this would be icing on the top, and if this was combined with some kind of scheduled write from local location-x to location-y in the cloud then that would be amazing. If not, relying on a scheduled rclone move job to upload files wouldn't be an issue for me and I'd rather prioritise the core union functionality than replace what already works well.

jrock · November 5, 2018, 4:55am

Yep. But I agree with @BinsonBuzz , optional functionality and we can just use rclone move instead if it is too difficult to implement.

Yes that would work.

Yes, great idea.

jrock · November 5, 2018, 5:22am

Maybe this then:

[TestUnion]
type = union
remotes = location-0-remote:location-0-path /location-1-local/location-1-path /Volumes/local-union-mount-point-path
location-0-io = ro
location-1-io = rw sync-commits 1h

Where:

location-0/1 are implied by the left-to-right order in the "remotes" line
Read priority also according to the left-to-right order in the "remotes" line with read order always being location-0, then location-1
Write/Delete only to "rw" remotes at the time of the event
Periodically sync from "rw" remotes to "ro" & other "rw" remotes if the "rw" remote has a "sync-commits" flag (at 1 hourly intervals in this example). If no sync-commits flag, don't sync.

@ncw - you lost me with the up to 4 remotes thing, suspect above doesn't abide by that. An example would help me to get it.

BinsonBuzz · November 5, 2018, 8:58am

I don’t think we need the ‘remote’ or ‘local’ bit:

[TestUnion]
type = union
location-0 = remote: ro sync-destination
location-1 = /Volumes/local-path rw sync-commits 1h
location-2 = 
location-3 =

Having each location on one line would be easier to organise for me.

Where config prompts for up to 4 remotes and asks which remote is the optional source for sync-commits, which is the destination (sync-destination) and the schedule. I think only having one sync-destination makes sense but each location could potentially have a sync-commit schedule

Edit: removed the union mount-path as don’t think that should be in the config

jrock · November 5, 2018, 10:02am

Yeah some good points there.

OK, so are we saying that a union mount can be a union of 2, 3 or 4 remotes (not just 2 remotes)? If so, coooooool.
I also like the location per line format, but I think @ncw was hoping to keep the “remotes” line instead, to minimise the refactoring. Did I read that right?
Am I correct in assuming that the current union remote does not track/persist changes in an overlay layer? Rather, it just:

applies read priority at run time during a listing
applies writes to the 2nd remote in the “remotes” config line.

If there’s no rw history, how would we even enable built-in periodic commits to the ro location(s)?

Or am I wrong…is the rw history captured in a directory cache?

For maximum flexibility, does it make sense to do the following:

Don’t restrict how many of the 4 remotes can be ro or rw. ie. all ro, all rw, or a combination.
Read priority defined by order of location-0…4.
Writes to all rw locations (asynchronously?).
Periodic commits to all ro destinations that are flagged with eg. “commit-destination 1h" (subject to 3 above…).

Thoughts?

ncw · November 5, 2018, 3:27pm

If we want location-0 location-1 etc, then the config wizard will have to prompt for each one, that is all. The config wizard can only prompt for a fixed number of things, so I suggested limiting it to 4 arbitrarily.

So it would look something like

Enter location-0 which should be a name of a remote followed by flags
location-0> remote:
Enter location-1 which should be a name of a remote followed by flags
location-1> /tmp/whatever
Enter location-2 which should be a name of a remote followed by flags
location-2>
Enter location-3 which should be a name of a remote followed by flags
location-3>

At the moment you can put as many as you want in!

It makes a bit more work, but not a lot, and I think the 1 per line makes the config much easier to understand

That is correct, except it is the last remote in the "remotes" config line as there can be more than 2.

I guess that is part of the extension work.

I'm not 100% clear what you'd want to happen - you write to local disk, but in the background rclone uploads it (moves it maybe?) to the cloud remote - is that right?

Agree with those.

rclone needs to write to (at least one) location synchronously, presumably the local disk? I don't want to build another cache layer! Is the idea that it would write to the cloud locations asynchronously, maybe even deleting the local copy after?

I see...

I think if the configuration was split

location-0: remote:
location-mode-0: ro

Then the config for the location-mode could have a nice dropdown (text equivalent) with lots of explanatory text.

Can the commit time be a global? That would be most convenient.

What do we think the different location-modes are

ro - read only - files are never written here only read
rw - files are read and written to here syncrhonously
wb - files are read from here and written back to here asynchronously from the rw locations after commit_time.

Essentially what I'd like to do is do an rclone copy or rclone move from the rw location to any wb locations.

This would give some constraints - the system is read only unless you provide one rw remote. If you have a wb remote then you must have a rw remote.

The current system is all the remotes except the last one are ro and the last one is rw.

Open questions

should we allow more than one rw remote? This would synchronously write a file to multiple remotes
- there is an issue about this
should we have a catchupcopy or move from the rw at regular intervals?
copy the files or move them from the rw remote?
- this could be a config option
- it could also be copy them, then delete them after a while

Yes I agree

I don't understand what you mean by sync-commits and sync-destination. Is it covered by what I wrote above about wb remotes?

That is a different discussion, certainly!

BinsonBuzz · November 5, 2018, 4:58pm

Loving where this is going

My preference is write to local disk i.e. highest priority location, so that this file is the one visible in the union mount
if either of the lower priority locations have a file with the same name as the new file, if it's a rw location delete the dup, if it's a ro 'hide it'
For the hidden files it'd be good if there was a record of this kept somewhere so if desired a script can be used to delete the files from the ro remote. e.g. I use this script to cleanup my gdrive files that are 'hidden' when 'deleted' by unionfs UnionFS Cleanup Script | enzTV

For my 2 above, I'm not sure of the exact terminology as I'm only a wannabe techie, but it'd be great if there was a 'read, delete but no write option' for the location. This would be ideal for me as if I upgrade a file e.g. a movie locally the old cloud file will be deleted and I can transfer the new local file at my leisure. In my scenario I don't want the union to write to the cloud automatically, but I'm happy for it to delete.

I say no - sounds complicated to me, but I bet there's someone out there who needs it!

would be nice, but I'm happy with rclone move to control bwlimit, max transfer etc etc

Yes

BinsonBuzz · November 5, 2018, 10:21pm

Sorry if I’m being greedy, but it’d be great if the rw location supported hardlinks as unionfs doesn’t (mergerfs does)

Animosity022 · November 6, 2018, 12:46am

hardlinks are only supported if things are on the same disk underneath, it would have to be done similar to how mergerfs handles it.

ncw · November 6, 2018, 7:45am

How would that work? I'm not familiar with unionfs/mergerfs?

BinsonBuzz · November 6, 2018, 9:24am

If I download a torrent and then I want to import it to my library, I can import it using a hardlink so I only have one copy of the file that remains in my seeding folder rather than copying which creates a 2nd copy and IO or moving it which would break seeding.

Unionfs doesn't support hardlinking so sonarr, radarr etc can't create a hardlink and have to create a 2nd copy - even if I'm trying to add a file to the local part of the union. Mergerfs does support hardlinking.

Animosity022 · November 6, 2018, 12:49pm

I believe at a high level it just works like a normal disk underneath. You can only hard link on the same file system. I have my torrents and local area on the same disk so I can hardlink.

Mergerfs allows that to pass through to the file system I would guess and that’s why it works.

BinsonBuzz · November 6, 2018, 12:51pm

Correct - unfortunately unionfs doesn't and mergerfs isn't available for unRAID (slackware) users like me

ncw · November 6, 2018, 1:13pm

I see. Assuming we use the local backend for local disk then it doesn't care about hardlinks (in fact they are completely invisible to it) so it would work just fine. I suspect unionfs works at an inode level where rclone works at a file/path level.

jrock · November 7, 2018, 3:59pm

Yes that's correct. I was also thinking that it might manage situations where the union mount is comprised only of cloud remotes (no local) - but I can see from the rest of conversation that this is not realistic/practical.

Yes, that seems sensible.

I think that would be fine for the majority of use cases.

Agree. So:

rw - files are read, created, deleted and updated here synchronously
ro - read only - files are never written here only read
rd - files are read and deleted from here synchronously (including deletion of moved/renamed files)
rowb - files are read from here and written back to here asynchronously from the rw locations after commit_time
rdwb - files are read and deleted from here synchronously and written back to here asynchronously from the rw locations after commit_time

I don't need this. Most use cases will be 2 remotes:

location-0 = rw local folder (highest read priority)
location-1 = rdwb cloud remote

With this config I probably don't even need rclone to remember any history.

I would only use move, but if this is configurable then that will help some people I suspect.

Do you mean this should be defined instead at runtime as a parameter of the rclone mount command? Maybe a default one in rclone.conf that can be overridden at run time?

The commit_time/wb/cactchup move etc are all optional. I'm happy to use a scheduled rclone move instead.

ncw · November 7, 2018, 4:54pm

That is looking quite like a specification!

For remotes with the delete option, I understand the specification to be

if there is a file in a higher priority remote with the same name then delete it in this remote

Is that correct?

That seems a little dangerous to me!

When would the deletion happen - when rclone detects it - ie when the user lists the directory or creates the new file? Or should rclone go hunting for them?

I think that is becoming a list of flags

read
write (synchronously)
writeback (write asynchronously)
delete (delete duplicates)

Where writeback would be done with rclone move from the rw backendto thewriteback` backend(s).

That is good

Mount path doesn't belong in the backend config. Not sure where it does belong yet, so for the time being it will have to go on the command line.

jrock · November 7, 2018, 6:49pm

Close. More like:

If I delete a file (from the highest priority remote, whose file is the one that is visible) then also delete that filename from other remotes that have the delete option enabled.

Only when the visible file is actively deleted by an user/application.

More broadly, I think @BinsonBuzz and I are trying to accommodate the following scenario:

location-0 is a local temporary folder and has the highest read prioity. location-1 is a cloud remote with most of our storage.
New files: When our local apps create new files, they are written to location-0 without any latency.
Moved/Renamed files:

If the original file exists only on location-0: just mv within location-0
If the original file exists only on location-1: delete the file in location-1, create the renamed/moved file in location-0
If the original filename exists in both locations: delete the file in location-1, mv the fle within location-0.

Updated files: If the file is opened and saved, or replaced with a file of a larger/smaller size (same filename), delete the original file in both locations, create the updated file in location-0.
Deleted files: Delete from both locations.

With this pattern:

The subsequent scheduled rclone move can just migrate all files off of location-0 to location-1 and we start again with a fresh, empty location-0.
Rclone doesn't need to remember any history.

Yep, that's fine.

BinsonBuzz · November 7, 2018, 9:28pm

If the file is just being moved within location-1, shouldn't it just be moved, rather than downloaded to location-0, just to be uploaded again unchanged?

jrock · November 8, 2018, 4:06am

I guess that depends on whether the cloud remote supports server-side moves. If it does (and probably all will eventually), then you are absolutely right.

So does our updated scenario look like this then:

location-0 is a local temporary folder and has the highest read priority. location-1 is a cloud remote with most of our storage.
New files: When our local apps create new files, they are written to location-0 without any latency.
Moved/Renamed files:

If the original file exists only on location-0: just mv within location-0
If the original file exists only on location-1: just move (server-side) the file in location-1
If the original filename exists in both locations: mv the file in both locations (server-side mv for location-1).

Updated files: If the file is opened and saved, or replaced with a file of a larger/smaller size (same filename):

If the original file exists only in location-0: just overwrite the original file in location-0
If the original file exists only in location-1: delete the file in location-1 and write it to location-0
If the original file exists in both locations: delete the file from location-1 and overwrite the file in location-0.

Deleted files: Delete from both locations.

That work?

jrock · November 8, 2018, 4:17am

Hmmm, thinking about this some more, one side effect of the change to mv (3) above is that the subsequent scheduled rclone move would need to handle files that have the same filename and size in both remotes.

So it would probably require an rclone move plus a deletion cleanup command. That is:

rclone move any file from location-0 to location-1 that is not identical (filename, subfolder, size) in both locations; then
delete any remaining (identical) files left over in location-0.

But also, if we start the union mount with location-0 being an empty local folder, then (3) should not really be able to occur anyway.