tus does not support non-sequential, I initially wanted to support it for future protocols but it is too problematic.
Is the issue that the resulting Object won't be ready before the Response returns?
The service handler manages the server url to remote url mapping, and the fs manages the remote url to backend url mapping. Would it be possible to include a ?upload_id=abc123 query parameter in the remote url when necessary? This way the fs retains control over the remote to backend url mapping, which in googles case is a very different url than the final one. It will also keep the upload stateless as this query parameter can be propagated to the service url.
The interface would have to return a path which is rewritten by the service handler before being returned to the upload handler:
type ResumableUploader interface {
ResumableUpload(path) (Uploader, path, error) // Start or continue an existing upload to path
ResumableCleanup() error // Clean up expired incomplete uploads
}
I would really like to avoid creating a temporary mapping between resources. It creates the issue of generating ids and all of the subtle complexity of doing that in a distributed manner.
HTTP/1.1 201 Created
Location: https://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216
Tus-Resumable: 1.0.0
Note the location contains a server defined URL.... That could contain the upload ID and the filename quite easily.
I see what you mean. However those IDs are generated by the backends themselves so someone has to keep track of them. If we can get them into the Location URL then the client will keep track of them...
There is no issue for backends that generate upload IDs, but we will have to manage ID generation for backends that don't. It would be better if the service handler treated the query as semi opaque and just forwarded it to the fs. Then the fs can stuff any information it needs into the returned url without needing explicit support for each parameter in the service handler code. Creating the concept of an ID rather than just using the service/remote path to uniquely identify the upload is unnecessary.
I was assuming that if the backend doesn't generate IDs we just use the file path as an ID.
I don't think we can get away from needing an ID to identify a multipart upload. However it looks like we can pass that straight to the client for the client to remember. The backend can stuff anything it wants into that ID (the actual ID from the cloud storage + a path + expiry, serialized and encoded in base64) and it will be opaque to everything else.
Shouldn't we try to have the url returned from the POST be the url that will eventually point to the final object? Any upload query parameters can be ignored during a GET.
I am not clear on what you have in mind for the url containing the ID.
We should be careful to not put any requirements on the structure of the path as we want this to be compatible with a variety of service url schemes.
POST is responsible for returning the URL for the uploaded resource. If that isn't the final url for the file then there is no way for the client to access the file after upload without somehow searching for it. The URL POSTed to also doesn't have to be a sub-path of the URL returned.
Automatically routed based on user credential:
Request: POST /upload
Response: Location: /user/data/filename
A url scheme for concatenated uploads:
Request: POST /user/data/
Response: Location: /user/data/filename/1
A 1:1 filesystem to url mapping that handles many many uploads:
Request POST /uploads/
Response: Location: /uploads/fi/le/filename
The service handler or the fs can decide to place the upload anywhere it wants.
tus doesn't specify anything for the url scheme but it does provide an optional GET handler. I am not sure if it handles resumable downloads.
The docs refer to the value of the Location: header as the "upload URL"
I don't think it is the URL that you are supposed to GET the file from - it doesn't say that in the spec. In fact there is no mention of GET in the spec at all!
Though it could be easily enough and we stick the extra metadata as a URL parameter
I've implemented a lot of uploaders for cloud backends and the "upload URL" is a common concept meaning upload your data here. You usually get the final URL when the upload has finished.
Update I looked at the javascript code and it does look like you are expected to download the file from the upload URL.
There doesn't seem to be a way of saying I've finished uploading to this file? It looks like (correct me if I'm wrong) you can add to any file at any time - is that correct? If so that will be a problem.
I don't understand why you would want your file uploaded to some random place decided by the upload handler?
Where is that - I haven't found that.
I note that tus doesn't support uploads where you don't know the size in advance. This is a bit of a limitation
I think a straight forward POST (with form) or PUT (with headers) would be a lot easier to implement than tus and do for 99% of the use cases of uploading. You could use it with curl or via javascript
Once it receives the last byte it is considered complete. This is why the protocol requires transmitting the total size at some point.
It depends on the service and what the path represents. A http client should also have no say in the upload location for security and collision handling.
Go ahead, but be aware that I have a major backlog of pull requests to review and not much time at the moment! Pull requests take a long time to review properly.
We are going to need to tweak the interface a bit. Using ? as a token in the path won't work if the backend allows ? in file names. I was thinking of passing the entire url.URL object in place of the path or url encoding the path but this is clunky.
type ResumableUploader interface {
ResumableUpload(path, options url.Values) (Uploader, path, options url.Values, error) // Start or continue an existing upload to path
ResumableCleanup() error // Clean up expired incomplete uploads
}
I am not sure if I am happy with pulling in a type from the url package for this interface.
Should I use an existing rclone type?
or should I just redeclare the type:
type TransactionOptions url.Values
or
type TransactionOptions map[string][]string
Can you think of anything better than TransactionOptions?
URL encoding the path is what most cloud storage systems do. They have the same problem. You need a little care encoding the path but url.URL has enough tools in it to let you do it.
I don't really want to pollute the Resumable interface with url.Values which will mean nothing to anything except the tus uploader.
I don't think any of the internal interfaces to rclone should care - they should be presented with file with questionmark? and work just fine.
The external interfaces might - for instance if you use rclone serve http you'll see rclone will URL encode and decode the ? for you, but I see that as the job of the external interface to get the URLs back to rclone standard format.
I managed to implement a generic Uploader for Fs that implement Concatenator.
Although I am not sure if it is better to specify a single directory to contain all pending uploads or if it is better to create it in the destination location. A single directory makes it easier to clean up dead uploads but complicates collision avoidance. What do you think of storing paths to the temporary files in boltdb?
Cleanup can then just scan the boltdb list of pending uploads and their modtime and delete accordingly.
If the boltdb is ever lost then a user can then just manually delete the temporary files.
I am going to take a swing at the S3 implementation now.