How to execute sync/copy or sync/sync with --dry-run option via POST request to the API

bluehelge · October 26, 2022, 2:22pm

What is the problem you are having with rclone?

I use the rclone installed on a RaspberryPI via HTTP-API to issue rclone-operations like e.g. the following
http://192.168.0.101:8888/sync/copy?srcFs=local:/home/pi/data&dstFs=aws-remote:s3.rclonebucket.testing
My problem is, that I would like to run those commands in --dry-run mode but I have actually no clue how to form the http-request to do that.

I am especially interested in getting detailed info which files need to still be copied/synced. I plan to issue this request after I did a copy/sync to actually check if there are still pending files to be copied/synced if for some reason the copy/sync command issued was interrupted.

Run the command 'rclone version' and share the full output of the command.

rclone v1.58.0
- os/version: raspbian 10.12
- os/kernel: 5.10.103-v7l+ (armv7l)
- os/type: linux
- os/arch: arm
- go/version: go1.17.8
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Amazon AWS S3

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone copy local:/home/pi/data aws-remote:s3.rclonebucket.testing --dry-run

This one works fine on the commandline, BUT I need the equivalent as http-request towards the HTTP-API which I would naively expect to be something like:
http://192.168.0.101:8888/sync/copy?srcFs=local:/home/pi/data&dstFs=aws-remote:s3.rclonebucket.testing&dryrun=true

The rclone config contents with secrets removed.

[aws-remote]
type = s3
provider = AWS
env_auth = false
region = eu-central-1
endpoint = 
location_constraint = 
acl = private
access_key_id = *****************************
secret_access_key = *******************************************
server_side_encryption = 
storage_class = 

[local]
type = local
copy_links = true

A log from the command with the `-vv` flag

There is no log available, because I do not know how to form a proper HTTP-API-POST-request to carry the important dryrun-flag.

Sorry, no log here.

ADDITIONAL QUESTION:
Is there a way to execute the rclone check command via http-API-request which is documented here?

Animosity022 · October 26, 2022, 2:46pm

I'm not sure what HTTP-API means as are you using the rclone rc stack to execute commands?

Dry run exists for sync/bisync:

Remote Control / API (rclone.org)

But not for copy:

Remote Control / API (rclone.org)

bluehelge · October 26, 2022, 3:06pm

Yes, HTTP-API means to me using the rclone remote control feature. But the documentation is actually not clear to me in this respect. How do I need to issue a post-http-request against the remote control (RC) to make the RC aware of executing a sync/sync or sync/copy in --dry-run mode?

And if sync/bisync is the only viable option at this time, how do I need to form my http-request for that command?
I use POSTMAN as an http-client to test calls against the rclone-RC so where exactly do i put the --dry-run option? Does it go in the http-path, the http-query (as a parameter) or do I need to add special body which contains this flag?

Ole · October 26, 2022, 3:13pm

Hi bluehelge,

Here is a simple POST example using --dry-run for sync:
https://forum.rclone.org/t/filter-in-restapi-mode/33187/7

I am pretty sure you can do the same with sync/copy.

More info on setting flags here:
https://rclone.org/rc/#setting-config-flags-with-config

bluehelge · October 26, 2022, 3:54pm

Thanks @Ole, that seems to work if I submit my JSON in the body as raw encoded JSON. But to my surprise there is no content/JSON delivered back to me. I get a 200 response, but it does not have any content other than {} while the rcd spits out lots of lines of what it actually checks. I am puzzled, what is the use of that option if I do not get a result back?

BTW, I fire up the remote control daemon with:

rclone rcd /home/pi/data --rc-user=bot --rc-pass=secret --rc-addr=0.0.0.0:8888 --rc-allow-origin=0.0.0.0 --rc-web-gui-no-open-browser --log-level=DEBUG

Ole · October 26, 2022, 8:09pm

Great!

I guess it very much depends on your need and situation, some start several concurrent jobs with the _async option to get a "job started" response not the actual result. That is probably the main reason you do not see a list files from sync/copy.

So perhaps you would like to use something like this instead:

POST http://localhost:5572/core/command HTTP/1.1
content-type: application/json

{
    "command": "sync",
    "arg": [
        "/home/pi/data", 
        "aws-remote:s3.rclonebucket.testing", 
        "--dry-run", 
        "--log-level=NOTICE",
        "--use-json-log=false" 
        ],
    "returnType": "STREAM"
}

which will give you a response body containing the normal plain text output from rclone.

You can try playing with the parameters to suit your needs.

The example is based on this:
https://rclone.org/rc/#core-command

I started with "rclone version", then added the ReturnType, then moved on to "lsl" "./testfolder", then added "--dry-run", then "--log-level=NOTICE" ... et voila!

bluehelge · October 27, 2022, 5:08pm

Whoa, that's awesome... I'll try this. Thanks @Ole this is really helpful input. I guess I just start beginning to understand how capable rclone actually is. I

Ole · October 27, 2022, 6:37pm

Thanks, happy coding!

bluehelge · October 29, 2022, 9:36am

@Ole I marked your suggestion as the solution. I think nevertheless, the documentation of the core/command way of doing things has a lot of potential to be improved. It seems to be rather undocumented right now. Your example of how to form a proper POST to the rcd should be mentioned there as a valid example of how to use the core/command way of interacting.

It still looks like a workaround to me, because if I use e.g. the following as the raw-body content in my POST request to rcd

{
    "command": "check",
    "arg": [
        "local:/home/pi/data/",
        "aws-remote:s3.rclonebucket.testing/", 
        "--use-json-log=true" 
        ],
    "returnType": "STREAM"
}

I do not get valid JSON back. It is rather a mixture of each log-line wrapped in curlybraces and intermingled text and then at the end another json-snippet, like so:

{$content-of-logline 1}
{$content-of-logline 2}
{$content-of-logline 3}
...
{$content-of-logline n}
Some weird not-wrapped-in-curlybraces text output usually appearing on the commandline
{$some-wellformed-JSON-summingup-the-whole-request-status}

I would expect this to be rather something like:

{ "log_lines": [
{$content-of-logline 1},
{$content-of-logline 2},
{$content-of-logline 3},
...
{$content-of-logline n}
],
"cmdline_output" : "Some weird not-wrapped-in-curlybraces text output usually appearing on the commandline",
"request_status": "$some-wellformed-JSON-summingup-the-whole-request-status"
}

Please take notice of the additional []-array and the , added and the wrapping in some {} to make it valid JSON.

What do you think, what would be the best way to:

Add your example for core/command
Improve the core/command documentation with an overview of ALL options available
Form a suggestion to the development to make the JSON emitted a valid JSON
Form a suggestion to add a dedicated sync/check implementation for the rcd which responds with valid JSON

I think it is worth it to improve these things.

Ole · October 29, 2022, 7:51pm

Thanks!

bluehelge:

I do not get valid JSON back. It is rather a mixture of each log-line wrapped in curlybraces and intermingled text and then at the end another json-snippet, like so:
{$content-of-logline 1}
{$content-of-logline 2}
{$content-of-logline 3}
...
{$content-of-logline n}
Some weird not-wrapped-in-curlybraces text output usually appearing on the commandline
{$some-wellformed-JSON-summingup-the-whole-request-status}

The intermingled text is because you are using "STREAM" showing input send to both stdout and stderr, use "STREAM_ONLY_STDERR" to only see the log output or "STREAM_ONLY_STDOUT" to only see the normal terminal output e.g. the file listing from lsl.

I agree a few good examples would make it a lot more approachable.

The best way is to make a pull request following the guideline in rclone/CONTRIBUTING.md at master · rclone/rclone · GitHub

I suggest you start by making a Github issue (type: feature request) referring this forum thread, then you will have a place to discuss any questions along the way. Fell free to ping me at GitHub where my callsign is @olefrost.

Not quite sure what you mean here, all options (that I know of) are described. I used 3 of the 4 options available in the example. The 4th option (opt) will just let you write flag pairs like this:

POST http://localhost:5572/core/command HTTP/1.1
content-type: application/json

{
    "command": "lsl",
    "arg": [
        "myRemote:"
        ],
    "opt": {
        "log-level": "DEBUG"
    },
    "returnType": "STREAM"
}

I just prefer to place the flags in the arguments (arg) using the assignment form "--log-level=DEBUG". That may be slight misuse, but I find it easier and better looking.

Perhaps are looking for the list of commands:
https://rclone.org/commands/
or the list of available flags:
https://rclone.org/flags/

You propose a new feature by making a new forum post of type Feature to get some feedback (after having checked that it doesn't already exist in the GitHub list of open issues: Issues · rclone/rclone · GitHub)

I don't think anybody will dare changing the current line-by-line json format because it would break a lot of other peoples stuff. So you better propose a new flag name, e.g. --use-json-log-strict

Same procedure as above.

I think you will need a very strong argument to make the response deviate from sync/sync, sync/copy and sync/move. A better approach might be to add sync/check like the others, and then extend all sync/* with a optional parameter to have the affected files returned in the response.

Finally, it is a good idea to indicate if it is something you are prepared to develop yourself (with/without help), or something you would like to sponsor.

Ole · October 30, 2022, 8:43am

Tip:

If using core/command with sync --dry-run, then you can quite easily scan the output for file names with regex present in most programming languages. (When knowing regex and the print statements used in the log)

I used regex101.com to help me make this:
https://forum.rclone.org/t/how-to-extract-file-names-after-move-command-is-successfully-done/33725/3

I guess the regex approach is considerable faster than first parsing the entire json into variables and then scanning for the desired fields and content. Would you like help to make one for --dry-run? using plain or json output? What programming language are you using? What IDE?

bluehelge · October 30, 2022, 11:56am

Yeah, actually I created a full FastAPI client in Python which provides to me a well formed & documented RESTful API to handle rclone operations. I am in need to trigger rclone operations in a smart way and let them do their thing mostly without any further surveillance/interaction of a user. My client kicks off e.g. a sync/copy and keeps track of any issues arising during transfer and it is initiating RETRYS whenever necessary. After a sync/copy a check should be performed automatically, and if that still reveals missing files it should reinitiate another sync/copy.

I want to automate this task of file copying as much as possible. My Python client just reports the state/results and progress to the user. I solved the command check call now with following Python code:

    # CHECK BACKUP OF COMPLETENESS (ISSUE CHECK CMD TO RCLONE RCD TO EVALUATE MISSING FILES)
    def get_checked_backup(self, source, target):
        source_path = self.filesystem_path_for_remote_with_key(source)
        target_path = self.filesystem_path_for_remote_with_key(target)
        if not source_path or not target_path:
            self.store.source = None
            self.store.target = None
            self.store.set_state(State.FINISHED)
            if not source_path:
                return self.rclone_error_405(StateError.NO_REMOTE, source)
            if not target_path:
                return self.rclone_error_405(StateError.NO_REMOTE, target)
        else:  # STORE MOST RECENT CONTEXT FOR POSSIBLE RETRY IF OPERATIONS FAILS
            self.store.source = source
            self.store.target = target
        args = []
        args.append(source_path)
        args.append(target_path)
        args.append("--filter=- .**")
        args.append("--fast-list")
        args.append("--one-way")
        args.append("--log-level=NOTICE")
        args.append("--use-json-log=true")
        payload = { "command": "check",
                   "arg": args,
                   "returnType": "STREAM" }
        response: APIResponse = self.rclone_cmd("/core/command", autoresponse=False, payload=payload)
        status_code = response.status_code
        util.print_styled(f"RESPONSE CODE: {status_code}", 'yellow')
        some_result = response.content
        # ADD SOME EXTRA OPS ON THIS RESULT TO MAKE USE OF THE WEIRD RCLONE OUTPUT
        if status_code==200 and some_result:
            result_str = str(some_result, 'UTF-8')
            util.print_styled(f"SPLITTING RESULT NOW...", 'violet')    
            str_array = result_str.split('\n')
            error_str = None
            index = 0
            files_missing = []
            has_command_error = False
            try:
                for current_str in str_array:
                    error_str = current_str
                    if index % 100 == 0:
                        util.print_styled(f"PROCESSING STRING # {index}", 'red')
                    json_obj = json.loads( current_str )
                    if json_obj['level'] == "error":
                        file_missing = json_obj['object']
                        files_missing.append(file_missing)
                    index = index + 1
            except Exception as e:
                util.print_styled(f"PARSING STOPPED AT STRING # {index}:\n{error_str}\n", 'red')
                util.print_styled(f"EXCEPTION: {e}", 'red')
                if index == 0:
                    has_command_error = True
            util.print_styled(f"RESULT PROCESSING COMPLETED", 'yellow')
            if has_command_error:
                content = { "error": error_str, "command_response": result_str }
                return JSONResponse(status_code=status.HTTP_400_BAD_REQUEST, content=content)
            else:
                content = { "num_of_errors": len(files_missing), "files_missing": files_missing }
                return content
        else:
            return self.rclone_api_failed(Exception("Rclone instance did not return result for CHECK operation."))

This leads to a very nice and simple feedback as JSON, which tells me where I have errors in files on the REMOTE like so:

{
  "num_of_errors": 3,
  "files_missing": [
    "07ce49b2-68c4-45bc-ab44-e7d8bd4ca55c/movie.mp4",
    "108c76c6-f66a-48c1-ad75-9ae4d912cc37/movie.mp4",
    "2d009e5c-eb70-4f79-915e-ab8f895bad80/movie.mp4"
  ]
}

It performs quite okay. Maybe RegEx is faster, I do not know.

My main concern is:
If on some higher followup-release of rclone someone will change the way this command output is returned from a command check call, then my parsing will break. I'd rather prefer a well defined, valid JSON-output for these command calls towards the rcd.

Ole · October 30, 2022, 3:11pm

Looks good, happy to see your rapid progress!

Both approaches are viable and the best choice probably depends a lot on prior experience (such as tools already mastered).

No matter the choice, I don't think you can avoid some kind of automated integration tests to verify the (expected) behavior of rclone each time you upgrade to a new version. This is just like rclone has intensive integration tests to verify the expected behavior of all underlying packages and API's (e.g. S3).

More info here:
https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md#testing

bluehelge · October 30, 2022, 6:52pm

Yes, I was able to pull off my auto-retry-feature for failed transfers with your help. To give you an overview of what I pulled off here... this (see image/screenshot) is the API of my Python based client. It provides a very much simplified way to interact with rclone. I wanted to restrict transfer activities to a limited amount of preconfigured storage-nodes (those named node or from_node/to_node in the API routes). The Python client automatically now takes care of the started transfer and tries to complete it. I hope this thing will provide robustness to my transfer processes and makes it convenient to use rclone in a defined way where you cannot accidentially purge/delete things etc.

I'll try to open source it as soon as it has gained some bulletproofness. I really like Python FastAPI, because it automatically creates a valid OpenAPI specification for you, if you use it the right way. This documents client nicely for other who need to interact with it. I had a rather rough start when I tried my first requests against rclone rcd and found it non-trivial to interact with rclone via rcd that was one reason why I created a separate client to be wrapped around rclone rcd.

I'll try to contribute to rclone. Thx for the github-hint/links.

system · November 2, 2022, 6:52pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.