Command Chaining

calisro · March 2, 2017, 7:26pm

I’ve been spending a lot of time late in optimizing my scripts given some of the new features in the latest versions of rclone. This has sparked some thought on how to do more tasks in combined ways with less interaction with the actual remotes where possible.

I was thinking that it would be really nice to be able to chain some commands together to optimize the thrashing of remotes. My scripts tend to do a lot of different things in batch and for each of them, it needs to interrogate the remote. Wouldn’t it be neat if I could make rclone aware of what I wanted to do next so it could preserve some of that data?

This would obviously be a semi-significant change but it would be nice to be able to tell rclone to do something like this:

rclone sync local1: local2: AND sync local2: remote1: AND dedupe remote1: AND sync remote1: remote2: AND dedupe remote2:

What would make this powerful is the data collected from step to step could be reused for the next step. In the above case, when we scanned for files on local2 for that initial sync, we likely now have a full listing of data to again sync that data with remote1 and so on. If rclone knows what it needs to collect up front, it could reduce loads on the cloud providers or local file systems.

Thanks for reading.

ncw · March 2, 2017, 9:40pm

Interesting idea.

There are some precedents for this in the linux ip command for instance, where you can give it -batch filename to run a whole batch of stuff from the file. batch can also read from stdin.

In rclone terms you would do this with another command

rclone batch file

Then in file you'd put

sync local1: local2:
sync local2: remote1:
dedupe remote1:
sync remote1: remote2:
dedupe remote2:

With the first command to error would exit.

What would make this powerful is the data collected from step to step could be reused for the next step. In the above case, when we scanned for files on local2 for that initial sync, we likely now have a full listing of data to again sync that data with remote1 and so on. If rclone knows what it needs to collect up front, it could reduce loads on the cloud providers or local file systems.

Alas that would be much trickier. It would be reasonably easy to re-use the remotes, but rclone isn't set up for re-using the intermediate results of commands.

I think making a local cache of the remote directory listing would probably work much better for this scenario.

calisro · March 2, 2017, 11:48pm

something like rclone batch would accomplish the chaining. It would be nice to have the functionality to execute host commands if required.

sync local1: local2:
!echo do something else before.
sync local2: remote1:
dedupe remote1:
sync remote1: remote2:
dedupe remote2:

Yes, a local cache could be used. I get a little worried about persistent cache databases though. I suppose it could be for the life of a batch but that cache would potentially need to store multiple remotes data since within a batch you may connect out to multiple remotes.

ncw · March 4, 2017, 11:24am

Yes me too!

It looks like I'm going to have to implement one for drive if I want any more quota for rclone though

I would do one cache per remote.

calisro · March 6, 2017, 7:24pm

Although I say I don’t like persistent caches, I’ve moved most of my batch processing to the rclone MOUNT for this reason. I can complete syncs/copies/rmdirs/checks in fractions of the time it was taking before. Im talking from 4 hours to several minutes and that load is now offloaded from Google.

When the caches are introduced, it would be good if it is remote-wide across mounts or adhoc copies. it will speed up syncs considerably for people.

MartinBowling · March 6, 2017, 8:50pm

so to speed up these commands you are doing and rclone mount, then adding that mount as a remote of the local type then doing whatever operations you need?

Just trying to figure it out as I am trying to trim down some of these 3 hr syncs were only 30 files or so are new out of 9000

calisro · March 6, 2017, 9:19pm

No. im mounting them and then using JUST the mounts. I never touch the remotes directly except THROUGH the mounts.

So previously I had a number of commands running one after another like this (removed extra shell stuff)

/data/Media1 is a LOCAL FS.
/data/Media2 is a GDCRYPT remote mount
/data/MediaG is a UNION of /data/Media1 and the /data/Media2

BEFORE:

rclone rmdirs riosgd-cryptp:Media"
...
rclone copy "/data/Media1" "riosgd-cryptp:Media"
...
rclone sync "/data/MediaG" "riosgd-cryptp:Media"
...
rsync --inplace -rlxi --checksum -n --out-format="%n"
--include=".nfo"
--include="/"
--exclude="*"
/data/Media1/Videos/ /data/Media2/Videos/ | egrep -v "/$" |tee $TMPFILE
...
rclone
--files-from=$TMPFILE
copy -v "/data/Media1/Videos/" "riosgd-cryptp:Media/Videos"

Now i've converted them to use the mount that represents that remote because that has a cache. So I do the following and i've increased my cache to 7 days. By doing ALL my work against the mounts intead of directly, I have no need to refresh the cache. The cache will be updated as it does the work and will always be correct.

AFTER:

rclone rmdirs /data/Media2
...
rclone copy "/data/Media1" /data/Media2
...
rclone sync "/data/MediaG" /data/Media2
...
rclone --include=*.nfo --checksum copy /data/Media1/Videos /data/Media2/Videos

My batch time is so fast now since it can do size-only comparisons almost instantaneously. I actually do all the above TWICE. One for ACD and one for GD but the code is almost hte same.

MartinBowling · March 6, 2017, 9:31pm

Nice! Thanks gonna try this out.

calisro · March 6, 2017, 9:32pm

To go even further. I am doing this right now to keep two GD accounts in sync:

rclone --transfers=200 --checkers=200 --verbose --checksum sync zonegd:rios/cloudp zonegd:cloudp

It is the slowest part of my batch and its all being done server-side. Im actually going to convert that single command to processing that copy directly rather than doing it from gdrive to gdrive server-side. It would actually finish in minutes as appposed to the hour it takes right now… Again because of the cache. Its actually less expensive for me to have my home server sync with that remote directly than having google do a server side sync between them.