Need better results when copying individual objects/files

I'm looking for an improvement to "copy" on a specified set of objects/files that provides standardized (parsable) results from which could be inferred on whether each object was copied, skipped, or failed. Ideally, I'm looking for a solution that combines "copy --files-from" and "check --combined". I would like to specify a list of objects to copy with options "--files-from" and "--combined", and get a standardized output like option "--combined" does for a "check".

Current Limitations:

  • rclone copy doesn't provide a standardized result. The only way to tell the status from each object is 1) by the stdout comments (not standardized/parsable), 2) by the return code which only works per file if only one file is provided, or 3) by querying the files with check before and after the fact.
  • The --files-from option does not provide any result for skipped files (for both copy and check). (Is this a bug? I was expecting the file that was skipped to be mentioned in the stdout comments for copy, and report the mismatch in the check.)
  • rclone check doesn't really run on individual files. If a single object or subset of objects is specified, it touches (though it skips over the actual compare) every object in a bucket. If a single object is specified and it is missing on the source, it treats the object as a bucket or folder. Also ran into a bug where rclone check on one object would hang while touching all objects in a bucket where there were many objects (1000) and at least one object (not the one specified) was being modified.
  • rclone copy's "--error-on-no-transfer" option is only useful for getting result details if it is run a single object

Without this feature improvement, it looks like I need to choose between two options:
1- Copy each file individually with option "--error-on-no-transfer". Then run rclone check with "--combined" and "--files-from" options to double-check and refine the results (to know if an object was skipped because the file doesn't exist on source or because it already exists at the destination).
2- Copy with option "--files-from". Then run rclone check on the entire bucket with "--combined".
With the first option, I miss out on parallel processing that "--files-from" provides. With the second option, the results for each object in the copy are delayed until the very end, and files not in the --files-from list are checked unnecessarily.

You'd use --log-file and parse the log file. You have option of json logging format as well to make it parse it how you want.

If you have more the file being copied, the return job would be for the command you run and the log file would have any error information / more detail information depending on what logging level you turn on.

What's the goal here? To validate the copy command worked?

It runs on a directory to match another directory. If you wanted one file, you can copy it again and check it anyway you like.

felix@gemini:~$ rclone copy /etc/hosts DB:
felix@gemini:~$ echo $?
0

That tells me based on whatever check I want to do (I used the default mod time/size), it matches.

Depending on the remote, you can use checksum if. you wanted to:

felix@gemini:~$ rclone copy /etc/hosts DB: --checksum -vvv
2022/02/04 11:39:19 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2022/02/04 11:39:19 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "copy" "/etc/hosts" "DB:" "--checksum" "-vvv"]
2022/02/04 11:39:19 DEBUG : Creating backend with remote "/etc/hosts"
2022/02/04 11:39:19 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2022/02/04 11:39:19 DEBUG : fs cache: adding new entry for parent of "/etc/hosts", "/etc"
2022/02/04 11:39:19 DEBUG : Creating backend with remote "DB:"
2022/02/04 11:39:19 DEBUG : hosts: dropbox = c4f256b92ec94ab03256dfb14d89d73fa9b6de6d90e077b915660e54400d46d6 OK
2022/02/04 11:39:19 DEBUG : hosts: Size and dropbox of src and dst objects identical
2022/02/04 11:39:19 DEBUG : hosts: Unchanged skipping
2022/02/04 11:39:19 INFO  :
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.3s

2022/02/04 11:39:19 DEBUG : 6 go routines active
2022/02/04 11:39:19 INFO  : Dropbox root '': Commiting uploads - please wait...

There are a number of open issues for summary type output:

bisync: print summary of operations · Issue #5688 · rclone/rclone (github.com)

Make stats print to stdout (feature request) · Issue #2453 · rclone/rclone (github.com)

Improve verbose output - print what file is being copied before the copy starts · Issue #881 · rclone/rclone (github.com)

Can you give an example of commands you run, output you get and what you'd expect instead?

My own scripts for my uploads (use move and not copy), but I use:

--stats-one-line -v --log-file /tmp/rclone.log

That gets me base information on what's copied (delete in my case as I use move) and my stats at the end in a pretty nice log file format for my own parsing.

2022/02/04 09:06:13 INFO  : BLAH3 - S01E08.mkv: Copied (new)
2022/02/04 09:06:13 INFO  : BLAH3 - S01E08.mkv: Deleted
2022/02/04 09:07:07 INFO  :    34.024 GiB / 38.645 GiB, 88%, 35.419 MiB/s, ETA 2m13s
2022/02/04 09:08:07 INFO  :    35.990 GiB / 38.645 GiB, 93%, 33.690 MiB/s, ETA 1m20s
2022/02/04 09:09:07 INFO  :    37.813 GiB / 38.645 GiB, 98%, 30.263 MiB/s, ETA 28s
2022/02/04 09:09:39 INFO  : BLAH2/BLAH2 - S33E01-E02.mkv: Copied (new)
2022/02/04 09:09:39 INFO  : BLAH2/BLAH2 - S33E01-E02.mkv: Deleted
2022/02/04 09:09:39 INFO  : BLAH2: Removing directory
2022/02/04 09:09:39 INFO  : BLAH: Removing directory
2022/02/04 09:09:39 INFO  :    38.646 GiB / 38.646 GiB, 100%, 23.939 MiB/s, ETA 0s
1 Like

hello and welcome to the forum, and thanks for that like.

for each source file, from the rclone debug log, what is not logged?

imho, given the narrow use-case, the existing backlog of 663 issues and the ability to script it,
your feature will not be implemented.

so you can
--- download the rclone source code and add the feature
or
--- create a new issue

in my case, for each run of rclone,
my python script scans the debug log file, uses regex to filter what info i need and sends me a summary email.

if you have found a bug, i would like to see it in detail.
what was the command, and what is the simplest test case to prove it?
what was the error in the debug log?

and not sure i understand "was being modified"?
that non-specified file, was modified?
--- before the rclone command started
or
--- at during the time rclone was running

Thanks for the quick replies! Let me experiment next week with "--stats-one-line -v --log-file" and see if it meets my needs.

Regarding the bug where rclone check hung, we reproduced this
at some point in time every time we ran the following test (at least 4 separate runs): On a bucket containing 9903 objects (sorry, I mentioned 1000 above but I had remembered the number wrong) run rclone copy on a single object, and then run rclone check on that object. 9900 objects were for copying and 3 objects were config or status files all in the same bucket. One of the objects was a status file that was being overwritten periodically. The failure would usually occur after a couple hundred objects were successfully copied/checked. Digging into a log of one of the runs showed that it happened on the next check after one of the status log writes. Note that 3 other checks were done successfully after a status log write on this particular run, so there seems to be a race condition of sorts. Rclone operations were all done in a Docker container. The checks hung indefinitely until the container processes were killed the next day.

At some point I'd like to give rclone sync the same output as rclone check, but we aren't there yet!

What backend are you using?

If you can kill rclone with SIGQUIT then it will produce a backtrace when hung and that is very useful to tell what it is doing. If you can paste one of those, I can tell what is happening.

I'm still hitting the issue where skipped files aren't reported with the "--files-from" option. In this example, I specify four files. One doesn't exist on the source, one doesn't exist on the destination, one is different than the destination, and one already exists on the destination. I only get the report on two of the four:

$ rm /tmp/rclone.log
$ cat temp.txt
unknown.txt
DancingLadies.JPG
copyme.txt
UF_Graduation.mp4
$ rclone copy local_alias:source/ remote_alias:destination/ --files-from temp.txt --stats-one-line -v --log-file /tmp/rclone.log
$ cat /tmp/rclone.log
2022/02/07 13:49:35 INFO : DancingLadies.JPG: Copied (new)
2022/02/07 13:49:35 INFO : copyme.txt: Copied (replaced existing)
2022/02/07 13:49:35 INFO : 151.367k / 151.367 kBytes, 100%, 236.826 kBytes/s, ETA 0s

What I would like as a response would be something like the following:

2022/02/07 13:49:35 INFO : unknown.txt: Skipped (-)
2022/02/07 13:49:35 INFO : DancingLadies.JPG: Copied (new)
2022/02/07 13:49:35 INFO : copyme.txt: Copied (replaced existing)
2022/02/07 13:49:35 INFO : UF_Graduation.mp4: Skipped (=)

Instead of "Skipped (=)", could be "Unchanged skipping" or any parsable string. When I use option "-vv", I do get "Unchanged skipping", but also other debug lines, mention of other files at the destination that are "Excluded", and no mention of "unknown.txt". I don't think I should rely on the verbose "-vv" output for scripting, though I suppose I could search for "objectname: Copied" or "objectname: Unchanged" and if it is not found, assume it is missing in the source. What do you think?

This is to do with the way --files-from works - files in that list that aren't found are silently skipped.

Is it possible to change how --files-from works? Maybe an option to not skip?

What would you want it to do?

Throw an error and stop everything? Or log an error, but carry on and do everything else?

The second option would be my preference probably.

This could be enabled by a flag, say --files-from-strict

I'm not thinking of an error thrown nor stopping everything. I was thinking more like "--report-on-no-transfer", which would log the fact (info level or warning at most) that an object was skipped in whatever manor of logging was chosen. If "--stats-one-line -v --log-file" is used, then expand the result set. Existing stats: "Copied (new)" and "Copied (replaced existing)". Additional stats: "Skipped (no source)" and "Skipped (unchanged)".

I had a play with this and I realised it is actually more difficult that you might hope...

For efficiency, rclone turns the --files-from list into a set of filters as it is much more efficient to list directories than see if individual files exist. If you use the --no-traverse flag then rclone will try to find the files individually.

So I could put a warning with --no-traverse but then that makes rclone inconsistent because it wouldn't log errors without it...

Interesting; I figured it had to do something with the implementation. Would I loose performance using the --no-traverse flag?

How about this... I see that with verbose logging, rclone knows when a file is "Unchanged skipping". Can this be added to the non-verbose output? We can assume if a file is not mentioned, it was not present in the source directory.

Most likely, it depends on exactly which backend you are using. Google drive absolutely hates --no-traverse and rate limits it to nothing! If you are using S3 it will depend on the exact distribution of the files in the files-from list (whether they are all in the same directory or not), but it could be faster but most likely slower.

I think what you are asking for, effectively, is the reporting facilities that rclone check has, but for rclone sync.

I have this on the ideas list!

Moving the "Unchanged skipping" to the verbose log will make it too noisy for most people, though no reason why you shouldn't parse the DEBUG logs if you want. You can log in JSON too which might help.

Thanks, I'm going to use this! Do you have any detailed documentation for --stats-one-line, or a list of the possible results for object copy/sync? I have the following and I'm missing the result for a failed copy:

Format:
<timestamp> INFO  : <object name>: <result>
where <result> is one of the following:

Copied (new)
Copied (replaced existing)
Deleted

Also, let me know if you have any additional advice for parsing the object results. I figure I'll probably want to handle the case where something like ": Copied (new)" is part of the file name.

hi,

i have a python script that i use to run rclone, 7zip, veeam, fastcopy and VSS.

for each app, there is a log file and my script parses each log using regex.

in your case, for this text
2022/02/07 13:49:35 INFO : DancingLadies.JPG: Copied (new)
this regex
^.* INFO : (.*): Copied \(new\)$
would output the filename of DancingLadies.JPG

Thanks! I will use the following to capture any copy status reported by INFO level:
^.* INFO\s*: (.*): (.+?)( \((.*)\))?$
and this to capture unchanged skipped objects:
^.* DEBUG\s*: (.*): Unchanged skipping$

1 Like

I can use this along with --no-traverse and -vv:
(r"^.* DEBUG\s*: (.*): \(doesn't exist on source\)$")

I'm still missing the string pattern for failures. Realized it probably doesn't fall under INFO, but under higher severity logging.

@ncw , will the copy failure results reported with option "--one-line" always start with the string "Failed to copy: "? I constructed this regular expression to filter error results:
^.* ERROR\s*: (.*): Failed to copy: (.*)$

Mostly...

There can also be corrupted on transfer messages from the copy routine.

If the listing traversal failed then there could be different error messages.