Http upload contribution

Hi, I would like to contribute functionality to allow uploading to a remote via a http request (from a browser).

I intend to accomplish this by integrating the tus standard: https://github.com/tus/tusd/blob/master/docs/usage-package.md

It has a rich ecosystem of clients making use of this functionality relatively easy.

Can anyone advise me on where in the codebase I should integrate this.
There are two parts to the tus library that I need to integrate. I have to attach the request handler to the http server instance, and I have to glue the tus.Datastore abstraction to (I am assuming) fs.Object.

I would appreciate input on this solution before I start working on it.

Edit: Investigating further there seems to be duplicate work between the rclone serve http functionality and the rclone --rc / rclone rcd functionality. I understand that they serve different purposes but I think it would benefit from unifying parts of it under another layer of abstraction.

Also, I found this line: github[dot]com/rclone/rclone/blob/159f2e29a89e36942e7946333e9ffa066376aa7b/fs/rc/rcserver/rcserver.go#L36
It implies that handlers might need to be registered globally rather than explicitly as part of the server initialization.

2 Likes

I'm sure @ncw will be glad to give you a pointer to where it makes most sense to add.

Thanks, I should also mention that I had already started a issue for this a while back but want to start moving it forward myself: github[dot]com/rclone/rclone/issues/3412

Not sure why that happened.
It may just be that if you are very new to the forum all links may be treated as spam? (or maybe just if you post multiple links in a short time on a new account).This should pass really quickly once you get your "basic" badge. I've never had an issue with any links - certainly not github ones.

Hidden posts can still be viewed by the way - they are just "spoiler-hidden".

I'll give you likes to get the system to pick you up faster hopefully :smiley:
If you also just browse around a few topics, you should have it very quick and get most restrictions lifted.

" Basic

This badge is granted when you reach trust level 1. Thanks for sticking around and reading a few topics to learn what our community is about. New user restrictions have been lifted; you’ve been granted all essential community abilities, such as personal messaging, flagging, wiki editing, and the ability to post multiple images and links."

The system sometimes flags thing as spam as have to take action as it's not perfect.

I just restored this topic deleted the duplicate topic I think as they looked the same.

@Animosity022 the rclone API topic was a very different question, can you restore it?

Sure. My mistake.

This shouldn't be too hard.

What exactly is the tus standard? Presumably some kind of POST or PUT request? I'm not sure you need a library for that?

This could be part of the rclone serve http machinery or it could be part of the rclone rcd machinery - they both have different purposes.

It is already possible to upload local files using the API.

the tus standard is just a protocol for various upload functionality like resumeable uploads or chunked uploads.
I only went with it because it has both server and client libraries that implement these features and there doesn't seem to be anything else as robust.

I was thinking it should be added to the httplib machinery so that all inheriting functions can utilize it. The issue is that the handler mux logic (along with most of the server init) will need to be centralized. I was thinking that the httplib could absorb all server code and simply provide a register/unregister interface for handlers. So long as a handler is registered, the server is running.

Actually reading a bit more of the tusd docs, maybe you are wanting to implement rclone serve tusd?

My goal is to have it as part of the rclone rcd functionality, so a web browser can fully interact with remotes.

Can you find a doc which describes how tus actually works at the http protocol level? I'd rather start from there than pull in a big library for something so conceptually simple as a POST...

tus manages an upload over multiple requests. I expect that users will be uploading multi-gigabyte files regularly.

https://tus.io/protocols/resumable-upload.html#core-protocol

In general cloud providers need data uploaded in one go and you can't go back and re-write a bit you've already uploaded. Some of the providers do support resumable uploads but rclone doesn't have a backend interface for that yet.

Looking at the tusd s3 code: https://github.com/tus/tusd/blob/master/pkg/s3store/s3store.go it looks like it does use the resumable upload feature from s3.

So I don't think all rclone backends can support tus except if

  • the file is uploaded in chunks and the client re-assembles the parts
    • this isn't desirable as it breaks the 1:1 file:object rclone usues
  • or rclone makes a local copy of the file while it is being uploaded
    • it would be pretty easy to do this with the VFS layer (this is a higher level layer with filesystem semantics which will buffer files to disk if required)

Some rclone backends could be supported if

  • we invented a resumable upload interface for the backends
    • this would be useful for rclone resuming uploads

So I don't think this is going to plug in easily unfortunately :frowning:

If you wanted to try the VFS route then I'd recommend an rclone serve tusd command.

Would it be reasonable to have an option for chunked uploads to be transparently cached before being sent to the remote? Sorry, I was mistaking the VFS for the FUSE functionality. Yea, that seems like a totally desirable bit of functionality.

Are you thinking that rclone serve tusd would behave the same as rclone rcd except with an additional upload endpoint? I still need all of the other functionality of rcd exposed.

I am all for creating a resumable uploads interface. The first step is just to get the front end integrated, then the backends can start to be modified. Or do you require this to all be done in a single PR?

If you want the rcd functionality then you just add --rc.

I think some experimentation is needed first!

How well does tusd work if you run tusd (with the file backend) on an rclone mount with --vfs-cache-mode writes. That is effectively the equivalent of the VFS solution I mentioned above?

Ah, I understand what you mean now about the rclone serve tus --rc approach, that works for me. Would it be better to not explicitly refer to the tus protocol and quietly provide it for all http services that might receive files? It looks like tus is somewhat backwards compatible with ordinary POST requests:
https://tus.io/protocols/resumable-upload.html#creation-with-upload

What do you think of my idea to refactor the http server code into a handler registration service? This will allow different endpoints to be implemented modularly and combined to create a http service.

Having tusd receive into a mounted folder does not work out of the box because the example server expects the filesystem to support hardlinks as part of its file locking mechanism. This isn't an issue though, I am assuming rclone backends provide their own file level synchronization mechanism. If not it would easy enough to implement an alternative.

Looking into it a bit, the requirements for a resumable interface seem quite low.
Backends would need to provide a WriteChunk(offset, data) function and some sort of transaction start()/finish()/rollback() interface to manage possible locking, concatenation, and cleanup.

I think tus looks rather complicated and a bit specialized so I'm not sure about providing it to all http services.

I'd rather start with just a simple POST or PUT upload which you could do with a single curl command. That would enable browser integration and command line use.

The http server code is somewhat modular at the moment. There is a standard part which deals with authentication / ssl etc all those boring parts. You can then plug your own http handlers into that. This is used by the rclone serve http and rclone serve webdav and also by the rcd. Note that rclone serve webdav also provides the rclone serve http service. So it would be easy to add the http service to rclone serve tusd for instance.

:frowning: Backends don't provide file locking in general. Have you tried hacking the locking out of the file backend?

The tusd backends I looked at seem to save the file and an additional "info" file which isn't really what we want in rclone either...

Maybe specifying that interface would be a good idea?

I wonder if the right inital approach for serve tusd might be to keep the additional info entirely in memory. So rclone could use normal streaming uploads via facilities that exist at the moment (so it will work with all backends). This wouldn't be perfect but it would be a start. It would mean if you shutdown rclone the resume wouldn't work.

The tus standard supports curl uploads. Here is an example (with some extra syntax to deduplicate the path):

FILE='/home/nolan/mozilla.pdf' sh -c 'curl -i -H "Upload-Length: `wc -c < $FILE`" -H "Content-Length: `wc -c < $FILE`" -H "Tus-Resumable: 1.0.0" -H "Content-Type: application/offset+octet-stream" --data-binary @$FILE http://localhost:1080/files/'

httplib does house much of the http server code. I mean to move the server instantiation code into httplib rather than have each endpoint responsible for instantiation. Then if I wanted to add tus globally (or any other extension) I don't have to shotgun its handler addition everywhere a server is instantiated.

I recompiled without the locker and ran into a different issue:

cd /path/to/mount
echo 'foo' > test.txt # success
echo 'foo' >> test.txt # echo: write error: Input/output error

tusd receives the same error from the filesystem because it touches the file and then writes to it.

tusd creates the info file for its hook system to ingest. We can completely omit these files.
A stone to kill two birds would be to use database/sql with a sqlite driver (or optionally any driver) for storing this data and manage file locks for any backend that doesn't support it. I also found that S3 does in fact support file locks.

You'll need to use --vfs-cache-mode writes on your mount I think.

I think for getting tusd working quickly, getting rclone mount + tusd working would be a good first step.