Advantage of new union remote?

That is looking quite like a specification!

For remotes with the delete option, I understand the specification to be

  • if there is a file in a higher priority remote with the same name then delete it in this remote

Is that correct?

That seems a little dangerous to me!

When would the deletion happen - when rclone detects it - ie when the user lists the directory or creates the new file? Or should rclone go hunting for them?

I think that is becoming a list of flags

  • read
  • write (synchronously)
  • writeback (write asynchronously)
  • delete (delete duplicates)

Where writeback would be done with rclone move from the rw backendto thewriteback` backend(s).

That is good :smile:

Mount path doesn’t belong in the backend config. Not sure where it does belong yet, so for the time being it will have to go on the command line.

Close. More like:

  • If I delete a file (from the highest priority remote, whose file is the one that is visible) then also delete that filename from other remotes that have the delete option enabled.

Only when the visible file is actively deleted by an user/application.

More broadly, I think @BinsonBuzz and I are trying to accommodate the following scenario:

  • location-0 is a local temporary folder and has the highest read prioity. location-1 is a cloud remote with most of our storage.
  • New files: When our local apps create new files, they are written to location-0 without any latency.
  • Moved/Renamed files:
  1. If the original file exists only on location-0: just mv within location-0
  2. If the original file exists only on location-1: delete the file in location-1, create the renamed/moved file in location-0
  3. If the original filename exists in both locations: delete the file in location-1, mv the fle within location-0.
  • Updated files: If the file is opened and saved, or replaced with a file of a larger/smaller size (same filename), delete the original file in both locations, create the updated file in location-0.
  • Deleted files: Delete from both locations.

With this pattern:

  • The subsequent scheduled rclone move can just migrate all files off of location-0 to location-1 and we start again with a fresh, empty location-0.
  • Rclone doesn’t need to remember any history.

Yep, that’s fine.

If the file is just being moved within location-1, shouldn’t it just be moved, rather than downloaded to location-0, just to be uploaded again unchanged?

I guess that depends on whether the cloud remote supports server-side moves. If it does (and probably all will eventually), then you are absolutely right.

So does our updated scenario look like this then:

  • location-0 is a local temporary folder and has the highest read priority. location-1 is a cloud remote with most of our storage.
  • New files: When our local apps create new files, they are written to location-0 without any latency.
  • Moved/Renamed files:
  1. If the original file exists only on location-0: just mv within location-0
  2. If the original file exists only on location-1: just move (server-side) the file in location-1
  3. If the original filename exists in both locations: mv the file in both locations (server-side mv for location-1).
  • Updated files: If the file is opened and saved, or replaced with a file of a larger/smaller size (same filename):
  1. If the original file exists only in location-0: just overwrite the original file in location-0
  2. If the original file exists only in location-1: delete the file in location-1 and write it to location-0
  3. If the original file exists in both locations: delete the file from location-1 and overwrite the file in location-0.
  • Deleted files: Delete from both locations.

That work?

1 Like

Hmmm, thinking about this some more, one side effect of the change to mv (3) above is that the subsequent scheduled rclone move would need to handle files that have the same filename and size in both remotes.

So it would probably require an rclone move plus a deletion cleanup command. That is:

  • rclone move any file from location-0 to location-1 that is not identical (filename, subfolder, size) in both locations; then
  • delete any remaining (identical) files left over in location-0.

But also, if we start the union mount with location-0 being an empty local folder, then (3) should not really be able to occur anyway.

Yes, that would work perfectly.

I still think moves from location-0 to location-1 should be parked and considered in phase 2, as I think the existing rclone move is good enough.

Yes I would be quite happy to manage that part manually as well if it is going to be too onerous to build initially.

I see - thank you for explaining.

Looking at your updated schema I think moves look good.

I think my preference here would be not to delete the file in step 2 or step 3, but rely on the rclone move to replace the file properly. This means that if anything goes wrong you still have the old file.

1 Like

Yes that would work, agree it is safer. And slightly less latency at runtime too.

:smile:

Could one of you please make a clean copy of the design so far into a new issue on github (link to this thread too) then we can have a think about implementation :smile:

1 Like

I will have a go at that.

2 Likes

Brilliant! @ncw Thanks for adding this to the backlog as it would solve my biggest outstanding issue, which is ditching unionfs as the extra IO from lack of hard link support (and I also suspect just moving files from other local paths to location-0) is killing my server setup

1 Like

Github issue created. Sorry it took a few days, got busy.

1 Like

Thanks for doing that :smile:

mergerfs allows “random” selection of branches for search-type actions.
i.e., access, getattr, getxattr, ioctl, listxattr, open, readlink

In theory, this would allow a user to:

  1. create three rclone remotes, each authenticating with a different account;
  2. pool them together with mergerfs, using category.search=rand;
  3. and have most Read-Only API calls spread among the three accounts’ API quotas

If successful, this could effectively triple the standard API quota, and thus triple rclone performance in an API-bottleneck situation.

Is this possible with the rclone Union remote?

N.B. it’s not even clear that this would work properly with mergerfs, but maybe it could work with rclone Union.

Thank you for your attention and hard work! :smiley:

It isn’t possible currently. That bring said, the only way you’d be able to use multiple users to mitigate quota issues against one gdrive is to share it or use a team drive. Both of those have their own restrictions/ negatives as well.

I suppose you could use one account with multiple client IDs to mitigate client ID API only restrictions but I think most people are quota restricted.

How could this be done with mergerfs? Do you have a working config?

Is there any new progress about unionfs?

Please don’t necrobump old topics. Closing.