Multiwrite union test beta

ncw · February 22, 2020, 10:13am

Yes you are right - grepped the wrong branch!

You are right I think.

I put a note on the pull request: https://github.com/rclone/rclone/pull/3782#issuecomment-589940146

Those confirm that there is no max quota on those Drives as @darthShadow suspected.

Max-Sum · February 22, 2020, 8:25pm

Space-related policies(lus, mfs...) would require the usage field. I would suggest using rand as create policy if the upstream drive has inifinite space.

Max-Sum · February 22, 2020, 8:36pm

The union backend has a cache for the about fields to avoid checking API everytime. You can set the cache time when setting up.

There is no such policy now but I do think this is a good request. I might try to do this.

The current default policy is just the same as trapexit/mergerfs. I do think using space-relavent policies or path-preserving policies would be inappropriate for rclone since they introduce too much complexity. However I can't decide which policy should be the default.

Lex · February 23, 2020, 11:05am

@Max-Sum Many thanks for all the work on the union code! Very promising

The challenge for Shared/Team Drives is that rclone about does not provide any result (I think ncw said it is a null response).

I'll try rand when I get to test again. See if that works.

ncw · February 24, 2020, 10:57am

It is null which in rclone speak means unknown but likely unlimited.

@Max-Sum it would probably be good for the users to treat this as very large rather than failing the writes.

calisro · February 24, 2020, 2:35pm

Q: In the debugging logs, seems like it would be helpful to understand what upstream was acted on. Currently there isn't really a way to tell what upstream responded to a query. Would that be too noisy?

Also, I second a cookbook with examples. Reading through the policies its difficult to understand what each does. Perhaps further clarity is needed there (but I do understand the doc somewhat mirrors whats on trapexit's)

Question #2:
I have a webdav and a google drive I wanted to combine and use as part of this and it works well. Except if the webdav is taken offline, I would have expected the query to continue on the 2nd upstream when the first one was unavailable but that failed. Is this somethnig that shoudl work?

[davs]
type = webdav
url = http://127.0.0.1:5443
vendor = other

[robgs-cryptp]
type = crypt
remote = xxxx
filename_encryption = standard
password = xxxx
password2 = xxxx

[mtest]
type = union
upstreams =davs:/:ro robgs-cryptp:Media:ro
action_policy = epff
create_policy = epff
search_policy = ff

This setup works fine when both are available, but if I turn off that webdav, I get the following:
time ./rclone lsl mtest:'xxxx'
2020/02/24 09:57:12 Failed to create file system for "mtest:xxxx": read metadata failed: Propfind http://127.0.0.1:5443/xxxx: dial tcp 127.0.0.1:5443: connect: connection refused

ncw · February 24, 2020, 4:46pm

I'm going to need some help here with some standard uses so once we've worked it out - please post!

A good question - @Max-Sum ?

Max-Sum · February 25, 2020, 12:33am

It's ok to treat it as very large. But policies like mfs doesn't state the behavior when the free space is equal. So maybe advising avoidance of space-relavant policy would be better?

Some policies returns multiple upstreams in certain order instead of a single one. However, in current definition, an upload will uploads to all upstreams returned. And that's what makes mirror work, by using all in create policy.

ncw · February 25, 2020, 7:49am

I see... I'd like for it not to fail at runtime when writing files though which is unfriendly to the users. Can we fail at initialization? I think best would be to allow it though with a note in the docs that the mfs policy behaviour is undefined or something like that. Or maybe allow it and make a runtime NOTICE that the behaviour is undefined?

Max-Sum · February 25, 2020, 7:56am

I prefer failing at initialization but that would cause an about request to all upstreams.

ncw · February 25, 2020, 7:58am

That would be undesirable...

Perhaps a NOTICE (just one per upstream) when we wanted the max size but didn't have it for a remote so used very large instead?

Max-Sum · February 25, 2020, 8:12am

I can make mfs/lus/lfs resilient to unavailable usage fields without guarenteed behavior. What do you mean by NOTICE one per upstream?

ncw · February 25, 2020, 10:07am

That sounds fine.

Myabe we could log once for each upstream about the problem with NOTICE or WARNING level.

Max-Sum · February 25, 2020, 10:32am

I have pushed changes to address the issue. Can you release a new beta?

ncw · February 25, 2020, 10:52am

I've pushed up a new beta here

https://beta.rclone.org/branch/v1.50.2-102-g769dc85d-pr-3782-union-beta/ (uploaded in 15-30 mins)

Max-Sum · February 25, 2020, 11:06am

Just notice that drive does not have Objects counting support. So working around 400k limit in Team Drive is not possible . @zappo

Lex · February 25, 2020, 11:30am

I've just tested rand as action and create policy.

rclone touch, copy and sync all appear to work, in initial tests.
rclone deletefile works, although occasionally had to run it more than once.

Tested on both My Drive and Team Drive.

If write and delete work with rand that is at least a start. Are you saying that balancing between multiple Team Drives is unlikely to be possible? (Sorry if I misunderstand)

@ncw With the new beta you just pushed, is there something particular that should work differently than the prior beta?

Any other actions/flags you would like me to test?

darthShadow · February 25, 2020, 1:14pm

By exploring the internal API calls from the browser, I did find 2 fields recursiveFileCount & recursiveFolderCount which contain all the necessary information for shared drives but unfortunately, they don't seem to be available in the public API.

Lex · February 25, 2020, 2:04pm

thestigma mentioned the potential addition of some persistent data for VFS caches. If that happens and if file count / size information fields are created, might it be possible to inject that info for Team Drives and any other remotes where about isn't feasible?
rclone size and tree both access file count and size. It wouldn't necessarily be current at all times, but could be a proxy for live data that multiwrite could use. <= Just a thought.

calisro · February 25, 2020, 2:17pm

Hi. Sorry I didn't understand the answer here. In a 'ff' scenario I would have thought that if I have 2 endpoint that if one was unavailable and the other replied it would return data. I'm not looking to mirror writes here. I'm looking to establish redundancy across multiple endpoints.