--stats-one-line-date broken in 1.64.0 and later

What is the problem you are having with rclone?

I no longer get transfer data for server-side moves when using --stats-one-line-date. This breaks reporting in our tooling, as we can no longer determine how much data was moved from one Google Drive account to another.

Run the command 'rclone version' and share the full output of the command.

Last working: 1.63.1
First broken: 1.64.0

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

"C:\\Users\\Chad\\Documents\\PowerShell\\Modules\\TMCTools\\bin\\rclone.exe" "move" "--checkers" "8" "--config" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939\\chad@cdsconsulting.co_stage1.conf" "--log-file" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939\\chad@cdsconsulting.co.log" "--low-level-retries" "3" "--retries" "1" "--stats" "5s" "--stats-file-name-length" "0" "--stats-one-line-date" "--temp-dir" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939" "--user-agent" "ISV|Transend Corporation|TMCTools/1.0" "--log-level" "DEBUG" "--max-depth" "20" "--drive-skip-dangling-shortcuts" "--drive-skip-shortcuts" "--check-first" "--create-empty-src-dirs" "--order-by" "size,mixed" "--transfers" "4" "--update" "--server-side-across-configs" "Source:" "Temp:"

The rclone config contents with secrets removed.

[Source]
type = drive
scope = drive
service_account_file = C:\Temp\key.json
impersonate = user@example.org

[Temp]
type = drive
team_drive = ...PVA
scope = drive
service_account_file = C:\Temp\key.json
impersonate = user@example.org

A log from the command with the -vv flag

1.63.1

2023/12/04 12:03:31 DEBUG : rclone: Version "v1.63.1" starting with parameters ["C:\\Users\\Chad\\Documents\\PowerShell\\Modules\\TMCTools\\bin\\rclone.exe" "move" "--checkers" "8" "--config" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_120331\\chad@cdsconsulting.co_stage1.conf" "--log-file" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_120331\\chad@cdsconsulting.co.log" "--low-level-retries" "3" "--retries" "1" "--stats" "5s" "--stats-file-name-length" "0" "--stats-one-line-date" "--temp-dir" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_120331" "--user-agent" "ISV|Transend Corporation|TMCTools/1.0" "--log-level" "DEBUG" "--max-depth" "20" "--drive-skip-dangling-shortcuts" "--drive-skip-shortcuts" "--check-first" "--create-empty-src-dirs" "--order-by" "size,mixed" "--transfers" "4" "--update" "--server-side-across-configs" "Source:" "Temp:"]
2023/12/04 12:03:31 DEBUG : Creating backend with remote "Source:"
2023/12/04 12:03:31 DEBUG : Using config file from "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_120331\\chad@cdsconsulting.co_stage1.conf"
2023/12/04 12:03:31 DEBUG : Source: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 12:03:32 DEBUG : Google drive root '': 'root_folder_id = 0AJMfyeX0r9BWUk9PVA' - save this in the config to speed up startup
2023/12/04 12:03:32 DEBUG : fs cache: renaming cache item "Source:" to be canonical "Source{aqL5q}:"
2023/12/04 12:03:32 DEBUG : Creating backend with remote "Temp:"
2023/12/04 12:03:32 DEBUG : Temp: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 12:03:32 DEBUG : fs cache: renaming cache item "Temp:" to be canonical "Temp{aqL5q}:"
2023/12/04 12:03:32 INFO  : Google drive root '': Running all checks before starting transfers
2023/12/04 12:03:32 DEBUG : Google drive root '': read info from Shared Drive "Intermediate_Source"
2023/12/04 12:03:32 DEBUG : 10MB.dat: Need to transfer - File not found at Destination
2023/12/04 12:03:32 DEBUG : Google drive root '': Waiting for checks to finish
2023/12/04 12:03:32 INFO  : Google drive root '': Checks finished, now starting transfers
2023/12/04 12:03:32 DEBUG : Google drive root '': Waiting for transfers to finish
2023/12/04 12:03:33 INFO  : 10MB.dat: Moved (server-side)
2023/12/04 12:03:33 INFO  : There was nothing to transfer
2023/12/04 12:03:33 INFO  : 2023/12/04 12:03:33 -        10 MiB / 10 MiB, 100%, 0 B/s, ETA -
2023/12/04 12:03:33 DEBUG : 10 go routines active

Note this line:

2023/12/04 12:03:33 INFO  : 2023/12/04 12:03:33 -        10 MiB / 10 MiB, 100%, 0 B/s, ETA -

1.64.0

2023/12/04 11:59:39 DEBUG : rclone: Version "v1.64.0" starting with parameters ["C:\\Users\\Chad\\Documents\\PowerShell\\Modules\\TMCTools\\bin\\rclone.exe" "move" "--checkers" "8" "--config" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939\\chad@cdsconsulting.co_stage1.conf" "--log-file" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939\\chad@cdsconsulting.co.log" "--low-level-retries" "3" "--retries" "1" "--stats" "5s" "--stats-file-name-length" "0" "--stats-one-line-date" "--temp-dir" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939" "--user-agent" "ISV|Transend Corporation|TMCTools/1.0" "--log-level" "DEBUG" "--max-depth" "20" "--drive-skip-dangling-shortcuts" "--drive-skip-shortcuts" "--check-first" "--create-empty-src-dirs" "--order-by" "size,mixed" "--transfers" "4" "--update" "--server-side-across-configs" "Source:" "Temp:"]
2023/12/04 11:59:39 DEBUG : Creating backend with remote "Source:"
2023/12/04 11:59:39 DEBUG : Using config file from "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115939\\chad@cdsconsulting.co_stage1.conf"
2023/12/04 11:59:39 DEBUG : Source: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 11:59:40 DEBUG : Google drive root '': 'root_folder_id = 0AJMfyeX0r9BWUk9PVA' - save this in the config to speed up startup
2023/12/04 11:59:40 DEBUG : fs cache: renaming cache item "Source:" to be canonical "Source{aqL5q}:"
2023/12/04 11:59:40 DEBUG : Creating backend with remote "Temp:"
2023/12/04 11:59:40 DEBUG : Temp: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 11:59:40 DEBUG : fs cache: renaming cache item "Temp:" to be canonical "Temp{aqL5q}:"
2023/12/04 11:59:40 INFO  : Google drive root '': Running all checks before starting transfers
2023/12/04 11:59:41 DEBUG : Google drive root '': read info from Shared Drive "Intermediate_Source"
2023/12/04 11:59:41 DEBUG : 10MB.dat: Need to transfer - File not found at Destination
2023/12/04 11:59:41 DEBUG : Google drive root '': Waiting for checks to finish
2023/12/04 11:59:41 INFO  : Google drive root '': Checks finished, now starting transfers
2023/12/04 11:59:41 DEBUG : Google drive root '': Waiting for transfers to finish
2023/12/04 11:59:42 INFO  : 10MB.dat: Moved (server-side)
2023/12/04 11:59:42 INFO  : There was nothing to transfer
2023/12/04 11:59:42 INFO  : 2023/12/04 11:59:42 -           0 B / 0 B, -, 0 B/s, ETA -
2023/12/04 11:59:42 DEBUG : 10 go routines active

Note this line:

2023/12/04 11:59:42 INFO  : 2023/12/04 11:59:42 -           0 B / 0 B, -, 0 B/s, ETA -

I believe this may be caused by the following commit:

commit de185de215c273e787cb770c287b087f5b484022
Author: Nick Craig-Wood <nick@craig-wood.com>
Date:   Sun Jul 30 05:39:01 2023 +0100

    accounting: show server side stats in own lines and not as bytes transferred
    
    Before this change we showed both server side moves and server side
    copies as bytes transferred.
    
    This made a nice easy to use stats display, but also caused confusion
    for users who saw unrealistic transfer times. It also caused a problem
    with --max-transfer and chunker which renames each chunk after
    uploading which was counted as a transfer byte.
    
    This patch instead accounts the server side move and copy statistics
    as a seperate lines in the stats display which will only appear if
    there are any server side moves / copies. This is also output in the
    rc.
    
    This gives users something to look at when transfers are running which
    was the point of the original change but it now means that transfer
    bytes represents data transfers through this rclone instance only.
    
    Fixes #7183

Note that stats in 1.64.0 still appear as before when using --dry-run:

2023/12/04 11:58:55 DEBUG : rclone: Version "v1.64.0" starting with parameters ["C:\\Users\\Chad\\Documents\\PowerShell\\Modules\\TMCTools\\bin\\rclone.exe" "move" "--checkers" "8" "--config" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115855\\chad@cdsconsulting.co_stage1.conf" "--log-file" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115855\\chad@cdsconsulting.co.log" "--low-level-retries" "3" "--retries" "1" "--stats" "5s" "--stats-file-name-length" "0" "--stats-one-line-date" "--temp-dir" "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115855" "--user-agent" "ISV|Transend Corporation|TMCTools/1.0" "--log-level" "DEBUG" "--max-depth" "20" "--drive-skip-dangling-shortcuts" "--drive-skip-shortcuts" "--check-first" "--create-empty-src-dirs" "--order-by" "size,mixed" "--transfers" "4" "--update" "--dry-run" "--server-side-across-configs" "Source:" "Temp:"]
2023/12/04 11:58:55 DEBUG : Creating backend with remote "Source:"
2023/12/04 11:58:55 DEBUG : Using config file from "C:\\Users\\Chad\\Domains\\cdsconsulting.co\\logs\\Move-GOUserDrives_20231204_115855\\chad@cdsconsulting.co_stage1.conf"
2023/12/04 11:58:55 DEBUG : Source: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 11:58:56 DEBUG : Google drive root '': 'root_folder_id = 0AJMfyeX0r9BWUk9PVA' - save this in the config to speed up startup
2023/12/04 11:58:56 DEBUG : fs cache: renaming cache item "Source:" to be canonical "Source{aqL5q}:"
2023/12/04 11:58:56 DEBUG : Creating backend with remote "Temp:"
2023/12/04 11:58:56 DEBUG : Temp: detected overridden config - adding "{aqL5q}" suffix to name
2023/12/04 11:58:56 DEBUG : fs cache: renaming cache item "Temp:" to be canonical "Temp{aqL5q}:"
2023/12/04 11:58:56 INFO  : Google drive root '': Running all checks before starting transfers
2023/12/04 11:58:56 DEBUG : Google drive root '': read info from Shared Drive "Intermediate_Source"
2023/12/04 11:58:56 DEBUG : 10MB.dat: Need to transfer - File not found at Destination
2023/12/04 11:58:56 DEBUG : Google drive root '': Waiting for checks to finish
2023/12/04 11:58:56 INFO  : Google drive root '': Checks finished, now starting transfers
2023/12/04 11:58:56 DEBUG : Google drive root '': Waiting for transfers to finish
2023/12/04 11:58:56 NOTICE: 10MB.dat: Skipped move as --dry-run is set (size 10Mi)
2023/12/04 11:58:56 INFO  : There was nothing to transfer
2023/12/04 11:58:56 NOTICE: 2023/12/04 11:58:56 -        10 MiB / 10 MiB, 100%, 0 B/s, ETA -
2023/12/04 11:58:56 DEBUG : 10 go routines active
2023/12/04 11:58:56 NOTICE: 2023/12/04 11:58:56 -        10 MiB / 10 MiB, 100%, 0 B/s, ETA -

Hi Chad!

I think you should see the stats without this flag it just that they are no longer part of the one line summary.

Having server side moves as transfers just proved too confusing so we had to move them to their own stat.

Hmm! That is because without doing the move we don't know whether this is a server side move or not so the --dry-run code assumes it is going to be a transfer. Not idea, but I can't think of an easy way round that.

Did you try using --use-json-log? This will give more accurate stats that are easier to parse.

The problem I now have is that I have no real way of determining doneness for a job. Previously, the one-liner allowed me to see the left number growing to meet the right, which is a clear way to see how approaching done the job was at any point in time.

Using the normal stats printing, I can't really do this. For example:

2023/12/04 12:43:21 INFO  : 
Transferred:   	          0 B / 134.469 MiB, 0%, 0 B/s, ETA -
Checks:                 7 / 11, 64%
Renamed:                7
Transferred:            0 / 50446, 0%
Server Side Moves:      7 @ 803.907 MiB
Elapsed time:        46.8s
Checking:
 * microsoft-graph-docs/concepts/toolkit/images/get-started.gif:  0% /4.342Mi, 0/s, -, moving
 * microsoft-graph-docs/api-reference/beta/includes/snippets/snippet-not-available.md:  0% /31, 0/s, -, moving
 * microsoft-graph-docs/concepts/images/mgt-one-productivity-hub/one-productivity-hub-overview.gif:  0% /1.777Mi, 0/s, -, moving
 * microsoft-graph-docs/api-reference/v1.0/includes/snippets/snippet-not-available.md:  0% /31, 0/s, -, moving

2023/12/04 12:43:22 INFO  : microsoft-graph-docs/concepts/toolkit/images/get-started.gif: Moved (server-side)
2023/12/04 12:43:25 INFO  : microsoft-graph-docs/api-reference/beta/includes/snippets/snippet-not-available.md: Moved (server-side)
2023/12/04 12:43:26 INFO  : microsoft-graph-docs/concepts/images/mgt-one-productivity-hub/one-productivity-hub-overview.gif: Moved (server-side)

How done is this job? I guess I can combine seven server-side moves with what I assume are 50,446 total files to make an estimate from an item perspective, but that doesn't help me determine doneness from an amount-of-data perspective.

I tried that, but it's still not obvious to me how to determine job doneness based on the data I get.

{
    "level": "info",
    "msg": "\nTransferred:   \t          0 B / 39.231 MiB, 0%, 0 B/s, ETA -\nChecks:               264 / 268, 99%\nRenamed:              264\nTransferred:            0 / 38450, 0%\nServer Side Moves:    264 @ 596.467 KiB\nElapsed time:      3m36.4s\nChecking:\n * microsoft-graph-docs/api-reference/v1.0/api/group-list-acceptedsenders.md:  0% /4.135Ki, 0/s, -, moving\n * microsoft-graph-docs/api-reference/beta/includes/snippets/go/get-privilegedapproval-2-go-snippets.md:  0% /290, 0/s, -, moving\n * microsoft-graph-docs/api-reference/v1.0/api/printoperation-get.md:  0% /4.126Ki, 0/s, -, moving\n * microsoft-graph-docs/api-reference/beta/includes/snippets/javascript/enum-contenttypes-site-list-javascript-snippets.md:  0% /290, 0/s, -, moving\n\n",
    "source": "accounting/stats.go:512",
    "stats": {
        "bytes": 0,
        "checking": ["microsoft-graph-docs/api-reference/v1.0/api/group-list-acceptedsenders.md", "microsoft-graph-docs/api-reference/beta/includes/snippets/go/get-privilegedapproval-2-go-snippets.md", "microsoft-graph-docs/api-reference/v1.0/api/printoperation-get.md", "microsoft-graph-docs/api-reference/beta/includes/snippets/javascript/enum-contenttypes-site-list-javascript-snippets.md"],
        "checks": 264,
        "deletedDirs": 0,
        "deletes": 0,
        "elapsedTime": 216.4944696,
        "errors": 0,
        "eta": null,
        "fatalError": false,
        "renames": 264,
        "retryError": false,
        "serverSideCopies": 0,
        "serverSideCopyBytes": 0,
        "serverSideMoveBytes": 610782,
        "serverSideMoves": 264,
        "speed": 0,
        "totalBytes": 41136957,
        "totalChecks": 268,
        "totalTransfers": 38450,
        "transferTime": 187.8005949,
        "transfers": 0
    },
    "time": "2023-12-05T09:39:06.270268-06:00"
}

Ultimately, I need to answer two questions:

  • What percentage of items have been moved? In other words, what is the current number moved and what is the total number that need to be moved?
  • What percentage of the data has been moved? In other words, what is the current data amount moved and what is the total amount of data that needs to be moved?

I understand that a server-side move is a rename, but we are moving files from one Google Drive account to another, so it is a transfer from our point of view. This change causes a great deal of problems for us and is unfortunate since a previous PR was added to specifically add the behavior we were relying upon.

We are no longer treating server side moves as transfers. Anything which isn't a transfer is counted as a check so you can see if they have finished with checks vs totalChecks.

It would be nice to surface the info. We could potentially have a totalServerSideMoves which would show the total to do.

Alternatively we could have a flag to treat server side moves as transfers which would bring back the existing behavior. I think it should probably be off by default and the old behaviour was wrong for reasons outlined in #7183.

Or maybe we need to rethink the stats a bit. I think the problem is that they are confusing two concepts

  1. Amount of data transferred over the network - this is what --max-transfers wants. This should be used to calculate the network speed.
  2. Amount of the sync/copy/move transferred. This shouldn't care whether the files get server side moved, copies or whatever, the interesting part is %done and bytes done - this is generally what users are interested in to see how much of their sync/copy/move is done.

Splitting the stats makes it easy to see that server side moves don't go in 1 but they do in 2. However stuff we transfer over the network goes in 1 and 2.

So maybe we need Transferred for 1) and Completed for 2) or maybe we should be showing them like this

Transferred:   	          0 B / 134.469 MiB, 0%, 0 B/s, ETA -
                          111  111111111111 111  2222   2222 which stat to use

What do you think @kapitainsky ? I know you were bitten by the problem in #7183

1 Like

Many people (including myself) use logs to extract data for script processing/reporting. I think more data and more granularity is better as long as it can be easily extracted without dis-ambiguity about data meaning. #7183 was clear improvement (removing double counting for some remotes) but maybe needs a bit more refining.

I think we do need transferred over the network not mixed with any server-side operations.

I think this is good idea.

It is not a "clear improvement." It is a regression that broke behavior for many people who use logs to extract data for script processing/reporting.

This has been ongoing problem with logs and stats as they are now - they are too specific. What for you was regression it was long awaited fix for many others.

We need solution where everybody can extract what they need - not some specialised statistic fitting into one person requirement - grep/sed/awk are cheap - so personally I do not see why not to have e.g. in this case both stats.

The problem is that the previous behavior was a deliberate design decision. Changing the stats to accommodate your use case should have been done via new behavior and not breaking existing functionality.

Again, it is ongoing saga... as things worked for me in the past until somebody changed it and broke it... so for me it was return to where it was:)

Now we definitely should be smarter moving forward and try not to break anything unless necessary - but even then only when alternative exist.

The easiest solution I can think of is to have a flag to determine if transfers are limited to wire transfers or not. Transfer is a loaded word that some take to be synonymous with wire transfer, but it can also mean a transfer of files from one account to another via a server-side move. If a flag can toggle between those two meanings, then the previous behavior would work for both use cases, and --stats-one-line-date can show whatever is expected by the user.

One other issue came to mind regarding the new split. Our tooling supports both server-side moves and export-import copying, so our log parsing would also need to detect what operation was performed and have different parsing logic depending on what is found since sometimes the data we need with be part of checkers logging and sometimes part of transfers logging.

Other challenge is introduced in v1.64:

Partial files: transfers to a temporary file for local,ftp,sftp file which is renamed when complete. Disable with the --inplace flag.

as well as the way how chunker remote works (uploading tmp files and renaming/moving them in the last step)

When both wire transfers and server side are combined it leads to double counting for above cases.

@ncw Just wanted to check back on this to see if there is anything actionable based on the continuing discussion. As indicated in the thread, we don't currently have a way to track doneness from a size-of-data perspective, even if we were to switch to using the JSON log. A switch that toggles the meaning of transfers would give us that and allow using the existing log format.

I am going to fix this, just still thinking about how! I'll drop an update here soon.

I have made an attempt to fix this - and hopefully I haven't broken the previous chunker fix.

Testing appreciated :slight_smile:

v1.66.0-beta.7594.cac662c09.fix-stats-extra on branch fix-stats-extra

This looks good for my use case.

1 Like

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.66 and in v1.65.1 which should be very soon!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.