I’ve been spending a lot of time late in optimizing my scripts given some of the new features in the latest versions of rclone. This has sparked some thought on how to do more tasks in combined ways with less interaction with the actual remotes where possible.
I was thinking that it would be really nice to be able to chain some commands together to optimize the thrashing of remotes. My scripts tend to do a lot of different things in batch and for each of them, it needs to interrogate the remote. Wouldn’t it be neat if I could make rclone aware of what I wanted to do next so it could preserve some of that data?
This would obviously be a semi-significant change but it would be nice to be able to tell rclone to do something like this:
rclone sync local1: local2: AND sync local2: remote1: AND dedupe remote1: AND sync remote1: remote2: AND dedupe remote2:
What would make this powerful is the data collected from step to step could be reused for the next step. In the above case, when we scanned for files on local2 for that initial sync, we likely now have a full listing of data to again sync that data with remote1 and so on. If rclone knows what it needs to collect up front, it could reduce loads on the cloud providers or local file systems.
There are some precedents for this in the linux ip command for instance, where you can give it -batch filename to run a whole batch of stuff from the file. batch can also read from stdin.
In rclone terms you would do this with another command
What would make this powerful is the data collected from step to step could be reused for the next step. In the above case, when we scanned for files on local2 for that initial sync, we likely now have a full listing of data to again sync that data with remote1 and so on. If rclone knows what it needs to collect up front, it could reduce loads on the cloud providers or local file systems.
Alas that would be much trickier. It would be reasonably easy to re-use the remotes, but rclone isn’t set up for re-using the intermediate results of commands.
I think making a local cache of the remote directory listing would probably work much better for this scenario.
Yes, a local cache could be used. I get a little worried about persistent cache databases though. I suppose it could be for the life of a batch but that cache would potentially need to store multiple remotes data since within a batch you may connect out to multiple remotes.
Although I say I don’t like persistent caches, I’ve moved most of my batch processing to the rclone MOUNT for this reason. I can complete syncs/copies/rmdirs/checks in fractions of the time it was taking before. Im talking from 4 hours to several minutes and that load is now offloaded from Google.
When the caches are introduced, it would be good if it is remote-wide across mounts or adhoc copies. it will speed up syncs considerably for people.
so to speed up these commands you are doing and rclone mount, then adding that mount as a remote of the local type then doing whatever operations you need?
Just trying to figure it out as I am trying to trim down some of these 3 hr syncs were only 30 files or so are new out of 9000
Now i’ve converted them to use the mount that represents that remote because that has a cache. So I do the following and i’ve increased my cache to 7 days. By doing ALL my work against the mounts intead of directly, I have no need to refresh the cache. The cache will be updated as it does the work and will always be correct.
My batch time is so fast now since it can do size-only comparisons almost instantaneously. I actually do all the above TWICE. One for ACD and one for GD but the code is almost hte same.
It is the slowest part of my batch and its all being done server-side. Im actually going to convert that single command to processing that copy directly rather than doing it from gdrive to gdrive server-side. It would actually finish in minutes as appposed to the hour it takes right now… Again because of the cache. Its actually less expensive for me to have my home server sync with that remote directly than having google do a server side sync between them.