Suggestion: new command "rclone rcat" (the opposite of "rclone cat")

durval · December 19, 2016, 12:35pm

Hello everyone,

I would like to suggest a command to do the opposite of “rclone cat” (and which I tentatively name “rclone rcat” for obvious reasons), that will take everything fed into its stdin and write to a single file in the cloud, for example:

echo "Mary had a little lamb" > rclone rcat REM:DIR/SUBDIR/testfile.txt

would create (or overwrite) a file called “testfile.txt” and located at DIR/SUBDIR under the REM remote, with the contents “Mary had a little lamb”.

The usefulness of this becomes more apparent when one considers that many backup commands (tar, cpio, etc) usually produce a single output file, which is then written to a local file, magtape, or piped into ssh to store in a remote host. With the “rcat” command, rclone can be used to store it in the cloud just as simply, for example:

tar czspf - DIR_BEING_BACKED-UP | \ rclone rcat REM:BACKUPDIR/host_DIR_YYYYMMDD.tar.gz

In fact, I’m using rclone to backup large trees of directories, and given that many (most?) cloud services put some rather stringent throttles on the number of objects that can be created per second, when these large trees contain lots of small files it becomes a rather slow process. With the “rcat” command, I would then be able to use tar to just write it out in a single “file” and send it simultaneously into the cloud remote, making the whole process much, much faster.

If everyone (and specially @ncw) agrees this is a good idea, please let me know and I will create an issue in github for this.

Thanks in advance,

Durval.

ncw · December 21, 2016, 12:01pm

I think it is a nice idea - please create an issue in github

Not 100% sure about the name, but can’t think of anything better right now!

Note that this won’t work for all remotes - some remotes need to know the size of the file in advance (eg b2).

Also it it goes wrong then rclone won’t be able to retry.

It is the same problem as trying to write files via an rclone mount, so making an abstraction which rcat and rclone mount could both use would be a good idea.

durval · January 7, 2017, 4:12pm

Hi @ncw,

[quote=“ncw, post:2, topic:494”]I think it is a nice idea - please create an issue in github
[/quote]

Just did, sorry for the delay in getting back to you: https://github.com/ncw/rclone/issues/1001

[quote=“ncw, post:2, topic:494”]Not 100% sure about the name, but can’t think of anything better right now!
[/quote]

Please feel free to use whatever name comes to your mind – my first idea was “tac” (the inverse of cat), but then I remembered there’s a traditional Unix command thus named, and which does a very different thing (output a file with its lines in reverse order), so I thought of “rcat” instead to avoid the confusion.

[quote=“ncw, post:2, topic:494”]Note that this won’t work for all remotes - some remotes need to know the size of the file in advance (eg b2).
[/quote]

One idea: for those remotes, something like the “split” Unix command can be implemented. To elaborate, a “–split=N” option would be needed to use rcat with one of those remotes as a destination, and then it would split the output in N-sized files. For example:

tar czf - ./somelocaldir | rclone rcat --split=2G remote:someremotedir/somelocaldir.tar.gz

And it would generate the following files:
remote:someremotedir/somelocaldir.tar.gz.aa
remote:someremotedir/somelocaldir.tar.gz.ab
remote:someremotedir/somelocaldir.tar.gz.ac
(…)

For “rclone rcat” to give the remote the N bytes it requires at the last of these files (assuming the total number of bytes being fed to rclone isn’t an exact multiple of N), rclone could simply write the needed number of zeroes.

Alas, if you need another name for that command instead of “rcat”, perhaps “split” is a good one: it would do the same thing than the Unix “split” command, only the splitted files would be created on the remote. If you use that name, I would also suggest using “-b, --bytes” instead of “–split=” as I suggested above, so as to be even more similar to the Unix “split”.

[quote=“ncw, post:2, topic:494”]Also it it goes wrong then rclone won’t be able to retry.
[/quote]

Why?

Cheers,

Durval.

ncw · January 8, 2017, 5:59pm

If rclone is piping the data, then it can't re-open the file to have another go, unless it makes a temporary copy of the file, which it could, but then it isn't really piping the data.

I like your split idea, but it doesn't solve the fundamental problem of needing to know the file size in advance.

durval · January 21, 2017, 2:23pm

With the "split" thing, rclone could buffer (in a temporary file) just the current split "segment"... this way there would be no need to know the size in advance, right? All segments would have the specified number of bytes, and the last "segment" would just be filled with zeroes from EOD on (presumably the program which the subsequent "rclone cat" data would be piped to would have to know where its data actually ends -- "tar", "cpio" and "gunzip" in my experience all do).

Cheers,
Durval.

ncw · January 21, 2017, 4:07pm

Yes buffering to file for split is a nice work around.

rcat should certainly have an option to use temporary files for uploading whether using split or not I think.

eborisch · April 4, 2017, 6:07am

In the interim (until rcat exists) I've created this work-around for piping data into an rclone remote:

Also works for reading (--replay) the piped data back to stdout. Saves additional metadata (checksums) of the piped blocks of data, and checks them on replay. (Not that anything can be done at that point other than day, hey, this is wrong.)