Extracting stats from status printout for monitoring

I run rclone nightly using crontab and direct the output to a log file. I would like to read this log file and push the stats printed into a monitoring tool. Here is a sample of the status output I’m talking about:

2017/08/15 13:36:25 INFO  : 
Transferred:     39 Bytes (10 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            1
Elapsed time:        3.8s

Scraping data out of a file is usually pretty easy, but rclone’s default output is tricky for a few reasons:

  1. human-readable auto-sized units
  2. status output may be printed multiple times
  3. keys in the above key/value list are not unique

My primary trouble is with #1 above. Are there settings I can pass to rclone to make reading this information easier for me?

I don’t know if this helps, but you can redirect standard out to some location and then keep standard error for stats. There will at least be less to parse and most likely you can just remove anything that contains a date and retain the stats for loading into your tool.

rclone lsl robgs:a/ --stats 1s -v 1>/some/log 2>/stats/and/errors

I believe all the output in the errors will be prefixed with a date that you should be able to remove and be left with just the stats.

Thanks. Unfortunately I’m more interested in capturing the file-count, byte-count, avg upload speed, and time elapsed. I have a lot of files and am trying to dynamically plan uploads for off-peak hours.

It’s not pretty… but here’s my temporary solution to go from rclone output string to json in python:

    # res.stdout is rclone the output

    import re

    def parse_units(number, units):
        """ parses units strings from rclone status """
        if (units == "Bytes"):
            return number
        elif(units == "KBytes"):
            return number*1e3
        elif(units == "MBytes"):
            return number*1e6
        elif(units == "GBytes"):
            return number*1e9
        elif(units == "TBytes"):
            return number*1e12
        else:
            raise ValueError("unknown units: ", units)

    with open(args.summarylog, "w") as sumlog:
        # get last status printout from rclone
        last_status = res.stdout.split("\n")[-6:]

        # parse out the various numbers
        byte_num, byte_unit  = last_status[0].split(':')[1].strip().split(' ')[:2]
        bytes_sent = parse_units(byte_num, byte_unit)

        speed_num, speed_unit = last_status[0].split('(')[1].split(')')[0].split(' ')
        avg_speed = parse_units(speed_num, speed_unit[:-2])

        err_count  = last_status[1].split(":")[1].strip()
        check_count = last_status[2].split(":")[1].strip()
        files_sent = last_status[3].split(":")[1].strip()

        time_parts = re.findall(r"[-+]?\d*\.\d+|\d+", last_status[4].split(":")[1])
        if (len(time_parts) == 1):  # s only
            time_spent = time_parts[0]
        elif (len(time_parts) == 2):  # m & s
            time_spent = time_parts[0]*60 + time_parts[1]
        elif (len(time_parts) == 3):  # h m s
            time_spent = time_parts[0]*60*60 + time_parts[1]*60 + time_parts
        else:
            raise ValueError("cannot parse time array : " + str(time_parts))

        # write a summary of the rclone output to file
        sumlog.write("{\n" +
            '\t"bytes_sent":' + bytes_sent + ',\n'
            '\t"avg_speed":' + avg_speed + ',\n'
            '\t"errors":' + err_count + ',\n'
            '\t"files_checked":' + check_count + ',\n'
            '\t"files_sent":' + files_sent + ',\n'
            '\t"time_spent":' + time_spent + '\n'
        "}\n")

1 Like

I wonder if it is worth a PR request or a ISSUE to add a stats-unit flag to specify a consistent unit for output rather than scaling as it is now.