--progress-json or --progress-bytes or alike to simplify parsing of progress

in our tool git-annex-remote-rclone we are using rclone copy for transferring content for annex into remotes. git-annex allows for our tool to report progress back to it in bytes, but ATM it is not trivial to establish monitoring of the rclone copy process regarding its progress: it requires parsing already humanized sizes from either stdout which likely to also carry ANSI characters, and looking for smth like Transferred: 411.125 MiB / 953.674 MiB and then converting MiB "manually" into pure bytes.
Or alternative is to configure it dumping into a log file at DEBUG level and then monitoring that file for Transferred: \t 953.674 MiB / 953.674 MiB and do similar dance with conversion.

In git-annex a similar design issue was resolved by having --json-progress include progress in JSON output and then the json record has explicit "byte-progress":16384 field which then could be easily read out and not require any conversion. Here is a more complete example of how such progress reports look like

(dandisets) dandi@drogon:/tmp/ds000003$ git annex get --json-progress sub-01/anat/sub-01_*
{"byte-progress":15854,"action":{"command":"get","note":"from s3-PUBLIC...","input":["sub-01/anat/sub-01_inplaneT2.nii.gz"],"key":"MD5E-s664614--0f8bc47f9c3047b340abfcd3ce1fb021.nii.gz","file":"sub-01/anat/sub-01_inplaneT2.nii.gz"},"total-size":664614,"percent-progress":"2.39%"}
{"byte-progress":432622,"action":{"command":"get","note":"from s3-PUBLIC...","input":["sub-01/anat/sub-01_inplaneT2.nii.gz"],"key":"MD5E-s664614--0f8bc47f9c3047b340abfcd3ce1fb021.nii.gz","file":"sub-01/anat/sub-01_inplaneT2.nii.gz"},"total-size":664614,"percent-progress":"65.09%"}
{"byte-progress":664614,"action":{"command":"get","note":"from s3-PUBLIC...","input":["sub-01/anat/sub-01_inplaneT2.nii.gz"],"key":"MD5E-s664614--0f8bc47f9c3047b340abfcd3ce1fb021.nii.gz","file":"sub-01/anat/sub-01_inplaneT2.nii.gz"},"total-size":664614,"percent-progress":"100%"}
{"command":"get","note":"from s3-PUBLIC...","success":true,"input":["sub-01/anat/sub-01_inplaneT2.nii.gz"],"key":"MD5E-s664614--0f8bc47f9c3047b340abfcd3ce1fb021.nii.gz","error-messages":[],"file":"sub-01/anat/sub-01_inplaneT2.nii.gz"}
{"byte-progress":16384,"action":{"command":"get","note":"from s3-PUBLIC...","input":["sub-01/anat/sub-01_T1w.nii.gz"],"key":"MD5E-s5712417--0d1e0a7ff7063250404f45a955a66203.nii.gz","file":"sub-01/anat/sub-01_T1w.nii.gz"},"total-size":5712417,"percent-progress":"0.29%"}
{"byte-progress":5712417,"action":{"command":"get","note":"from s3-PUBLIC...","input":["sub-01/anat/sub-01_T1w.nii.gz"],"key":"MD5E-s5712417--0d1e0a7ff7063250404f45a955a66203.nii.gz","file":"sub-01/anat/sub-01_T1w.nii.gz"},"total-size":5712417,"percent-progress":"100%"}
{"command":"get","note":"from s3-PUBLIC...","success":true,"input":["sub-01/anat/sub-01_T1w.nii.gz"],"key":"MD5E-s5712417--0d1e0a7ff7063250404f45a955a66203.nii.gz","error-messages":[],"file":"sub-01/anat/sub-01_T1w.nii.gz"}

Did you try --use-json-log? That will output the stats messages (the things you are parsing) in machine readable format.

You can set the log level of the stats independently so --stats-log-level NOTICE to make them show by default and --stats 10s to make them show every 10 seconds.

ok, let's try -- running with stats each 2 sec and timeout for 5 seconds overall ends up in empty log file suggesting it not being flushed or lacking any progress:

(dandisets) dandi@drogon:/tmp$ rm -rf HERE.nwb; timeout 5 rclone copy --progress --use-json-log --stats-log-level NOTICE --stats 2s --log-file /tmp/progress.jsonl  dandi-dandizarrs-dropbox:dandi-dandisets/annexstore/006/0bd/SHA256E-s13540840570-S1000000000-C1--1e7acce536d53c92501de98f4b8f77bcd0c0f1286fc1f58fba863dd8526faf3e.nwb  HERE.nwb 
Transferred:      834.234 MiB / 953.674 MiB, 87%, 198.234 MiB/s, ETA 0s
Transferred:            0 / 1, 0%
Elapsed time:         4.2s
Transferring:
 * SHA256E-s13540840570-S…8fba863dd8526faf3e.nwb: 87% /953.674Mi, 198.232Mi/s, 0s

(dandisets) dandi@drogon:/tmp$ ls -ld /tmp/progress.jsonl
-rw-r----- 1 dandi dandi 0 Jan  5 12:43 /tmp/progress.jsonl

running for completion shows that we would have only the end stat:

(dandisets) dandi@drogon:/tmp$ rm -rf HERE.nwb;  rclone copy --progress --use-json-log --stats-log-level NOTICE --stats 2s --log-file /tmp/progress.jsonl  dandi-dandizarrs-dropbox:dandi-dandisets/annexstore/006/0bd/SHA256E-s13540840570-S1000000000-C1--1e7acce536d53c92501de98f4b8f77bcd0c0f1286fc1f58fba863dd8526faf3e.nwb  HERE.nwb 
Transferred:      953.674 MiB / 953.674 MiB, 100%, 86.682 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        12.2s
(dandisets) dandi@drogon:/tmp$ cat /tmp/progress.jsonl
{"level":"warning","msg":"\nTransferred:   \t  953.674 MiB / 953.674 MiB, 100%, 86.682 MiB/s, ETA 0s\nTransferred:            1 / 1, 100%\nElapsed time:        12.2s\n\n","source":"accounting/stats.go:482","stats":{"bytes":1000000000,"checks":0,"deletedDirs":0,"deletes":0,"elapsedTime":12.26351236,"errors":0,"eta":0,"fatalError":false,"renames":0,"retryError":false,"speed":90893100.62791826,"totalBytes":1000000000,"totalChecks":0,"totalTransfers":1,"transferTime":0,"transfers":1},"time":"2024-01-05T12:45:54.986126-05:00"}

how to tune invocation to get stats to do get saved?

You don't want to use this - this is for console output.

I tried this on a local directory with 1000files small files in it (made with rclone test makefiles --files 1000 1000files)

rclone copy 1000files 1000files.copy --bwlimit 100b --use-json-log --stats-log-level NOTICE --stats 2s --log-file z.log

If I tail z.log in another window I can see the stats lines being emitted every 2 seconds

{"level":"warning","msg":"\nTransferred:   \t    1.374 KiB / 41.406 KiB, 3%, 117 B/s, ETA 5m49s\nChecks:               139 / 139, 100%\nTransferred:           22 / 861, 3%\nElapsed time:        12.0s\nTransferring:\n *                       layoxuq8wej/jetamup7dep:100% /94, 46/s, 0s\n *                               yecazuf6v/gonoh:100% /67, 66/s, 0s\n *                              layoxuq8wej/luni:100% /14, 13/s, 0s\n *                             xoqomez/bobinid2l:100% /47, 0/s, -\n\n","source":"accounting/stats.go:512","stats":{"bytes":1407,"checks":139,"deletedDirs":0,"deletes":0,"elapsedTime":12.008493488,"errors":0,"eta":349,"fatalError":false,"renames":0,"retryError":false,"serverSideCopies":0,"serverSideCopyBytes":0,"serverSideMoveBytes":0,"serverSideMoves":0,"speed":117.24046244137092,"totalBytes":42400,"totalChecks":139,"totalTransfers":861,"transferTime":12.000376007,"transferring":[{"bytes":94,"eta":0,"group":"global_stats","name":"layoxuq8wej/jetamup7dep","percentage":100,"size":94,"speed":45.57898598229487,"speedAvg":46.995593082249904},{"bytes":67,"eta":0,"group":"global_stats","name":"yecazuf6v/gonoh","percentage":100,"size":67,"speed":51.49169404890564,"speedAvg":66.98674278770837},{"bytes":14,"eta":0,"group":"global_stats","name":"layoxuq8wej/luni","percentage":100,"size":14,"speed":13.181250252244409,"speedAvg":13.99254301002384},{"bytes":47,"eta":null,"group":"global_stats","name":"xoqomez/bobinid2l","percentage":100,"size":47,"speed":310.3642933250031,"speedAvg":0}],"transfers":22},"time":"2024-01-07T11:40:13.129018+00:00"}

Which piped through jq look like

{
  "level": "warning",
  "msg": "\nTransferred:   \t    1.374 KiB / 41.406 KiB, 3%, 117 B/s, ETA 5m49s\nChecks:               139 / 139, 100%\nTransferred:           22 / 861, 3%\nElapsed time:        12.0s\nTransferring:\n *                       layoxuq8wej/jetamup7dep:100% /94, 46/s, 0s\n *                               yecazuf6v/gonoh:100% /67, 66/s, 0s\n *                              layoxuq8wej/luni:100% /14, 13/s, 0s\n *                             xoqomez/bobinid2l:100% /47, 0/s, -\n\n",
  "source": "accounting/stats.go:512",
  "stats": {
    "bytes": 1407,
    "checks": 139,
    "deletedDirs": 0,
    "deletes": 0,
    "elapsedTime": 12.008493488,
    "errors": 0,
    "eta": 349,
    "fatalError": false,
    "renames": 0,
    "retryError": false,
    "serverSideCopies": 0,
    "serverSideCopyBytes": 0,
    "serverSideMoveBytes": 0,
    "serverSideMoves": 0,
    "speed": 117.24046244137092,
    "totalBytes": 42400,
    "totalChecks": 139,
    "totalTransfers": 861,
    "transferTime": 12.000376007,
    "transferring": [
      {
        "bytes": 94,
        "eta": 0,
        "group": "global_stats",
        "name": "layoxuq8wej/jetamup7dep",
        "percentage": 100,
        "size": 94,
        "speed": 45.57898598229487,
        "speedAvg": 46.995593082249904
      },
      {
        "bytes": 67,
        "eta": 0,
        "group": "global_stats",
        "name": "yecazuf6v/gonoh",
        "percentage": 100,
        "size": 67,
        "speed": 51.49169404890564,
        "speedAvg": 66.98674278770837
      },
      {
        "bytes": 14,
        "eta": 0,
        "group": "global_stats",
        "name": "layoxuq8wej/luni",
        "percentage": 100,
        "size": 14,
        "speed": 13.181250252244409,
        "speedAvg": 13.99254301002384
      },
      {
        "bytes": 47,
        "eta": null,
        "group": "global_stats",
        "name": "xoqomez/bobinid2l",
        "percentage": 100,
        "size": 47,
        "speed": 310.3642933250031,
        "speedAvg": 0
      }
    ],
    "transfers": 22
  },
  "time": "2024-01-07T11:40:13.129018+00:00"
}

cool, that works, Thank you!
doing tail --follow=name z.log | jq '.stats.bytes' indeed shows me growing size... then it gets stuck at the end for a while (I guess flushing) but I hope that it is ok -- I could just keep reporting the same full size number - hopefully git-annex would be happy with it. Now I just need to figure out how to make it the most kosher way within my bash script... ok -- with chatgpt we created following abomination which seems to work

#!/bin/bash

set -eu

# Helper function to extract an integer value from a JSON string using sed
get_json_int() {
    json_string=$1
    key=$2
    echo "$json_string" | sed -En "s/^.*\"$key\":\s*([0-9]+).*$/\1/p"
}

# Function to monitor rclone progress
rclone_progress() {
    # Create a temporary file for the log
    log_file=$(mktemp "${TMPDIR:-/tmp}/rclone-annex-tmp.XXXXXXXXX")

    # Run rclone with the specified options and log to the temporary file
    rclone "$@" --use-json-log --stats-log-level NOTICE --stats 1s --log-file "$log_file" &
    rclone_pid=$!

    # Monitor the log file for changes and extract the "bytes" field
    tail -fn0 "$log_file" | while IFS= read -r line; do
        bytes=$(get_json_int "$line" "bytes")
        if [[ -n $bytes ]]; then
            echo "PROGRESS $bytes"
        fi
        
        # Check if rclone process is still running
        if ! ps -p $rclone_pid > /dev/null; then
            break
        fi
    done

    # Wait for rclone to finish and capture its exit code
    wait $rclone_pid
    exit_code=$?

    # Clean up the log file
    [ -e "$log_file" ] && rm "$log_file"

    # Return the exit code of the rclone command
    return $exit_code
}

rclone_progress "$@"

trying it now in Get progress report from rclone log file and channel it to git-annex by yarikoptic · Pull Request #77 · git-annex-remote-rclone/git-annex-remote-rclone · GitHub and locally.

Not sure what is happening here... The stats should continue from rclone as long as rclone is running.

ChatGPT is better at bash than me by the look of it :wink: The script looks good to me.