RFC3986 remote scheme

I am creating a new thread for this as it is a larger separate change from the upload handler.
My project was going to wrap rclone in this functionality but I figure everyone would benefit if it were integrated.
@ncw What do you think about depreciating the current rclone remote:path scheme for the RFC3986 standard?

The standard defines the scheme as scheme ":" hier-part [ "?" query ] [ "#" fragment ]

Pre-configured rclone remotes would be named as the authority with a rclone: scheme:
rclone://[remote]/path?query

Unconfigured remotes can be specified using the backend name as the scheme:

http://user@host:port/path?query
s3://token@host:port/path?query
ftp://user@host:port/path?query

The query portion contains key=value pairs that override or specify options for the remote.

dropbox://client_id:client_secret@/path?chunk-size=1000

If the host is omitted it will default to the official domain name of the service. Preconfigured rclone remotes can specify if they allow overrides to avoid security implications.

I don't think it is currently possible to transfer between two different s3 endpoints as the --s3-endpoint argument is set globally. This specification would allow such a transfer.

Command line compatibility is maintained by having the default scheme being file:. url paths have to be url encoded to support file and directory names that are not RFC3986 compliant. To avoid requiring command line users from having to url encode their local path parameters, only urls without a scheme are automatically encoded.

rclone copy './foo/b?ar' 'rclone://myremote/'

maps to

rclone copy 'file:./foo/b%3Far' rclone://myremote/

Windows path drive letters will be ambiguous with a scheme, this will require that backend names be longer than a single letter to disambiguate. Will it be necessary to support multi letter drive names? I don't think that occurs often in the wild. The presence of \ in the path along with an unknown scheme might also be a good criteria to disambiguate.
This is how the paths would map:

c:\my\files\file -> file:/c/my/files/file

Many backends already specify a url scheme for their resources. I think a uniform scheme is more important than trying to forward any of the quirks of the backend schemes.

For backwards compatibility this can all be disabled unless a global --urls argument is passed to rclone.

I expect that the following changes will be required:

  • modify all of the /cmd/cmd.go newfs*() functions
  • modify /fs/cache/cache.go to key on scheme://host
  • modify /fs/fs.go ParseRemote() to accept url.URL

Doing this would cause a bit of a backwards compatibility problem I think as some people insist on writing remote://path even though the docs never use that form.!

How would you differentiate between remote:path and remote:/path which is required by some remotes (eg sftp, dropbox)?

You are right in that you can't supply different parameters to the same backend used twice on the command line - that is a limitation of the current scheme. You can do it with environment variables though.

I think the idea of making a connection string as it were for a backend is interesting.

I think using the RFC scheme will be more typing for users so I suspect the current scheme is best here, however it would be good for machine/scripting use.

What do you see as your use case?

This is already the case on Windows - remotes have to be longer than 1 char.

PS The remote:path scheme comes from rsync just in case you were wondering!

Can you elaborate? I don't understand

What is the difference? Absolute vs relative? Relative to what?
If the server has some concept of a working directory then remote:path would be rclone://remote/./path

what about a shorthand that also url encodes?

@remote/p?ath -> rclone://remote/p%3Fath

@ is not a valid scheme character and is not ambiguous, it should also be command line compatible.
This short hand would not allow query parameters.

We could stick with the --url flag idea and just keep both. The rsync scheme feels jenky to me.

I need to effectively serialize a rclone config entry and remote path in a way that is meaningful to other applications. This will ultimately be used to provide rcd with unconfigured remotes for actions.

I guess I was assuming that we'd be able to tell the paths apart without the --urls flag. With the --urls flag there is no problem.

Relative to what depends on the backend. For example sftp:path refers to something in your home directory whereas sftp:/path refers to something in the root directory.

I'd like to focus on your use case for a moment.

Why isn't using the rc commands to create a config then using that config enough for your apps?

Do you need an opaque connection string style, or do you want a string that the application can tweak?

Are you trying to get rid of the state in the config file? That is very hard for the oauth remotes which need somewhere to store tokens? I don't think a connection string style connection will ever be useful for them.

There would be room in the current backend grammar to fit parameters I think. Rclone remote names can only contain [\w_ -]+ at the moment. That doesn't include , so paths could become

remote:path
:backend:path
remote,param=value,param2=value2:path
:backend,param=value,param2=value2:path

which would solve your problems.

Isn't the home directory simply the 'working directory'. I think you can change which folder the user is initially in at login rather than it specifically the home directory.

The config can not persist after the request. Even during the request it would allow other users to possibly collide with the config.

I am specifically want the RFC3986 url standard, it is a widespread means of referring to a network resource. Even if rclone provided an alternative I would likely just convert from RFC3986 to whatever rclone used. I figured it would benefit everyone if I just integrated this directly into rclone.

My application already makes heavy use of that standard, manipulating and generating urls.

The tokens will be managed by my application and embedded into the urls: s3://token@aws.com/path?query. These urls will likely be generated on the fly rather than stored with credentials.

I still have to convert from RFC3986 to this, and stuffing all of the RFC3986 features into the remote name field (like authority and fragment) would start to get awkward. Regardless, there is more value in supporting an existing standard than inventing our own. I get you are trying to maintain the relationship to rsync and its feel, I just don't see the value of it from a technical perspective.

You could use a UUID as the config then delete it after.

Which backends were you planning to use? Working without config will only work for the stateless backends, eg s3/swift/gcs/sftp and not for any of the backends which need oauth eg drive/dropbox/box/most of the others.

That would require a multi step transaction and users would be able to access other users configs.
There would also be the issue of a config persisting due to a client connection failure. It would be better to keep rclone stateless.

All of them, except the authentication would be worked out before interacting with rclone. I am not sure about the specifics of oauth but some mechanism would have to be worked out to allow a user to preauthenticate independent of rclone and then pass a token to rclone for it to access the backend.

I think for your scheme to work we'd need to work out that mechanism.

At the moment, for example, a drive "token" is a ~1k JSON blob. It contains within it the actual token used, the expiry date and the refresh token. Rclone will use the refresh token to update the actual token when it expires (typically they only last 1 hour).

That is unexpectedly enormous. Can rclone store only the token? Storing the token and the destination that it unlocks is too much of a risk.

Indeed! The onedrive token alone is 1.6k not including the refresh token and the expiry.

Rclone needs to be able to refresh the token since they only last 1 hour normally so needs the expiry and the refresh token too.

I'm not following you here.

Sorry, I assume that the token is stored in the config. How is it stored? I am struggling following the oauthutil module.

They look like this in the config where the XXX are very long!

[drive]
type = drive
token = {"access_token":"ya29.ImGXXXXXXXXXXXXXXXXXX","token_type":"Bearer","refresh_token":"1/A3AGzF9XXXXXXXXXXX","expiry":"2020-01-14T17:01:29.71196971Z"}
root_folder_id = 0ADXXXXXXXXXX

I suggest you make a google drive remote and investigate what the config looks like!