What is rclone checking during a purge?

When running rclone purge remotename:path/to/dir, I see the following:

rclone purge some_remote:some_bucket--progress --checkers 100000
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:              1114 / 1119, 100%
Deleted:              557 (files), 1 (dirs)
Elapsed time:        26.5s
Checking:
 *                     small/small_file_5127.bin: checking
 *                     small/small_file_5128.bin: checking
 *                     small/small_file_5129.bin: checking
 *                      small/small_file_513.bin: checking
 *                     small/small_file_5134.bin: checking

Evidently, something is being checked, but I have no idea what. Adding the flag --skip-checksum or increasing the number of checkers does not help.

What is rclone checking when purging a remote?

please post the full output of rclone version

i tested on lastest rclone, v1.61.1, on two buckets, one without versioning, one with versioning.
same basic result, no checking

rclone_1.61.1 purge wasabi01:bucket -vv --progress
DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone_1.61.1" "purge" "wasabi01:bucket" "-vv" "--progress"]
DEBUG : Creating backend with remote "wasabi01:bucket"
DEBUG : S3 bucket bucket: bucket is versioned: false
DEBUG : Waiting for deletions to finish
DEBUG : "file.ext" version false
DEBUG : file.ext: Deleting (id "")
INFO  : file.ext: Deleted
INFO  : S3 bucket bucket: Bucket "bucket" deleted
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1 (files), 1 (dirs)
rclone_1.61.1 purge wasabi01:bucket -vv --progress
DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone_1.61.1" "purge" "wasabi01:bucket" "-vv" "--progress"]
DEBUG : Creating backend with remote "wasabi01:bucket"
DEBUG : S3 bucket bucket: bucket is versioned: true
DEBUG : Waiting for deletions to finish
DEBUG : "file.ext" version false
DEBUG : file.ext: Deleting (id "001675462277672177117-cTUmw5uara")
INFO  : file.ext: Deleted
INFO  : S3 bucket bucket: Bucket "bucket" deleted
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1 (files), 1 (dirs)

Hi Vedant,

My quick guess is that you should read the stats like this:

Transferring:
 *                     small/small_file_5127.bin: deleting
 *                     small/small_file_5128.bin: deleting
 *                     small/small_file_5129.bin: deleting
 *                      small/small_file_513.bin: deleting
 *                     small/small_file_5134.bin: deleting

so increasing --transfers will speed up the purge.

Inspiration: https://forum.rclone.org/t/issue-using-checkers-flag-with-rclone-delete-on-lyve-cloud-storage/34916/7

If not, we need more info like rclone version, rclone config show some_remote: and the output of

rclone purge some_remote:some_bucket--progress --checkers 100000 -vv

That is 100% correct!

If you were to grep the rclone source for NewCheckingTransfer you'll find all the places that these are used. (What a confusing name!)

Checkers are used in rclone to mean any process affecting a file/object which isn't a transfer, so could be

  • hashing
  • reading modtimes
  • deleting
  • renaming

Amongst others.

However these are all printed in the stats as

The easiest way to fix this would be to think of another word for checking like in-use or working.

Slightly harder would be for each type of thing to get its own word, so deleting, renaming, hashing, reading modtime but that would be a lot more usable.

What do you think?

1 Like

Thanks for clearing up what's going on. While it might be more clear for rclone to specify that each type of thing gets its own word (i.e deleting, renaming, hashing, etc.), there would be a bit of a terminology mismatch where the --checkers flag would affect how much parallelization is going on when, e.g, hashing.

Ideally, the stats would hint at the possible tuning parameters to speed up the command. So at the high level there would a block of things performed by the --checkers with the heading “Checking:” and a block of things performed by the --transfers with the heading “Transferring:”. Each would then have up to 8/4 ongoing items/lines with default settings.

The text after each item (file name) would then hint at the ongoing activity using the terminology used in the corresponding tuning flags and information whether the activity is happening on source or destination. So e.g. hashing could be called “checksumming, destination” hinting at --ignore-checksum.

Having looked into the code, I don’t think this is easily achieved and it would probably also break/change the rc API interface to read stats.

So nice to have, but I don’t think it is worth the effort.

I agree with that.

That sounds possible.

I'm not sure about that - it may be possible. Will have a quick go.

I think we can arrange so it doesn't break that API quite easily.

@Vedant_Roy and @Ole Have a go with this

v1.62.0-beta.6727.99fb9dbd0.fix-checking-names on branch fix-checking-names (uploaded in 15-30 mins)

This changes the "checking" into one of

  • checking
  • deleting
  • hashing
  • importing
  • listing
  • merging
  • moving
  • renaming

Most of these things only happen in unusual corner cases - the most visible one will be "deleting"

1 Like

Great! Much better, but seems like the headings are still wrong. I tried

rclone sync ./emptyfolder/ GoogleDrive:folder/with/100/files --progress

resulting in:

Checking:
 *                                      femigid9: deleting
 *                                         fomuk: deleting
 *                                      gunocox0: deleting
 *                                   bevojeq9vel: deleting

I would like to see the heading "Transferring:" (corresponding to --transfers=4) or alternatively 8 "deleting" lines (corresponding to --checkers=8).

There can potentially be a mix of any of the words above in the "Checking" section.

I could break it into sections I guess which wouldn't be too hard. Or alternatively we could renaming the Checking heading into something like Processing, Using, Working on

Eg

Processing:
 *                                      femigid9: hashing
 *                                         fomuk: deleting
 *                                      gunocox0: moving
 *                                   bevojeq9vel: deleting

Or

Hashing:
 *                                      femigid9: hashing
Deleting:
 *                                         fomuk: deleting
 *                                   bevojeq9vel: deleting
Moving:
 *                                      gunocox0: moving

However I think there will almost always be only one type of thing going on at once so maybe the latter is the best?

Agree, my point is to make the heading reflect the tuning parameter that can be used to adjust the number of lines under the heading (the level of concurrency).

So anything limited by the number of --checkers is under the Checking: header and anything limited by --transfers is under the Transferring: header.

I think I misunderstood your post! Let me try again :slight_smile:

So what you are worried about here is that there are clearly only 4 things in the "checkers" category, but checkers is 8.

I think this is actually a bug here - we are running --transfers delete routines not --checkers as you would expect.

If that was changed to use Checkers then it would work as expected I think.

In fact I found 3 places where we were using the wrong concurrency, so I've fixed those.

Please give this a go.

v1.62.0-beta.6730.32ea1ce3e.fix-checking-names on branch fix-checking-names (uploaded in 15-30 mins)

Exactly!

Sure, this looks even better, happy to see the 8 "deleting" lines under "Checking:" :slight_smile:

Now extending the test a bit to this sequence:

rclone test makefiles ./testfolder/makefiles100/ --files=100    
rclone sync --track-renames ./testfolder/ GoogleDrive:testfolder/ --progress
mv ./testfolder/makefiles100/ ./testfolder/makefiles100renamed/
rclone sync --track-renames ./testfolder/ GoogleDrive:testfolder/ --progress 

resulting in these stats

Transferred:            318 B / 318 B, 100%, 0 B/s, ETA -
Checks:               112 / 120, 93%
Renamed:               12
Elapsed time:         8.0s
Checking:
 *               makefiles100/sewewot7c/cosezow4:  0% /26, 0/s, -
 *                   makefiles100/sewewot7c/hiju:  0% /86, 0/s, -
 *               makefiles100/sewewot7c/powovof3:  0% /91, 0/s, -
 *              makefiles100/sewewot7c/leyizub3k:  0% /78, 0/s, -
 *                makefiles100/rabokoh9pol/pesuf:  0% /0, 0/s, -
 *             makefiles100/sewewot7c/gopibim9ne:  0% /7, 0/s, -
 *             makefiles100/sewewot7c/kuwizaq7gu:  0% /20, 0/s, -
 *                makefiles100/sewewot7c/revepaf:  0% /70, 0/s, -

Which I think ideally should look more like this (note changes in all lines except "Elapsed Time" and "Checking:")

Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Checks:               100 / 100, 100%
Renamed:               12 / 100,  12%
Elapsed time:         8.0s
Checking:
 *               makefiles100/sewewot7c/cosezow4: renaming
 *                   makefiles100/sewewot7c/hiju: renaming
 *               makefiles100/sewewot7c/powovof3: renaming
 *              makefiles100/sewewot7c/leyizub3k: renaming
 *                makefiles100/rabokoh9pol/pesuf: renaming
 *             makefiles100/sewewot7c/gopibim9ne: renaming
 *             makefiles100/sewewot7c/kuwizaq7gu: renaming
 *                makefiles100/sewewot7c/revepaf: renaming

I also tried another test inspired by this and this forum post, where the time spend on checksumming (on local and SFTP) isn't (easily) visible in the stats.

This is what I see when I do a copy to a slow external USB2 drive (app. 22 seconds to copy and another 22 seconds to do the checksum) :

> rclone copy .\makefile1GB\ D:\testfolder\ --progress --stats=20s 

Transferred:   	  927.777 MiB / 1 GiB, 91%, 46.949 MiB/s, ETA 2s
Checks:                 1 / 1, 100%
Transferred:            0 / 1, 0%
Elapsed time:        20.0s
Transferring:
 *                                       file1GB: 90% /1Gi, 45.519Mi/s, 2s

Transferred:   	        1 GiB / 1 GiB, 100%, 14.430 MiB/s, ETA 0s
Checks:                 1 / 1, 100%
Transferred:            0 / 1, 0%
Elapsed time:        40.0s
Transferring:
 *                                       file1GB:100% /1Gi, 14.430Mi/s, 0s

Transferred:   	        1 GiB / 1 GiB, 100%, 11.890 MiB/s, ETA 0s
Checks:                 1 / 1, 100%
Transferred:            1 / 1, 100%
Elapsed time:        43.9s

I would ideally like the 40 seconds stats to look like something like this:

Elapsed time:        40.0s
Transferring:
 *                                       file1GB:100% /1Gi, checksumming

or like this (if changed to be performed by the --checkers):

Elapsed time:        40.0s
Checking:
 *                                       file1GB: checksumming

Great!

(I'm enjoying seeing rclone test makefiles here :slight_smile:

This is really for server side copies which can update their percentage of the way through here.

Note that renames are moves and moves can also be transfers...

What I can do is put the tag on the end if it is present, so

Checking:
 *               makefiles100/sewewot7c/powovof3:  0% /91, 0/s, -, moving
 *                 makefiles100/sewewot7c/toceca:  0% /92, 0/s, -, moving

Which I think is OKish

The other changes are possible, but quite difficult so I'm going to skip them for the moment.

...

I can't get that local to local copy to do a checksum - I think itis doing it as it is going along.

The problem here, I think, is that if there is both a transfer and a check going on, we only show the transfer.

To fix this I'd need to merge the checkers and transfers and implement a reference count for them. This may well be worth doing.

Let me know how this looks now

v1.62.0-beta.6745.fe30e8cca.fix-checking-names on branch fix-checking-names (uploaded in 15-30 mins)

I could merge this like it is for 1.62 (due in 10 days) and rework for the transfers/checkers for 1.63 or leave the change for 1.63 - what do you think?

I have tested and find it a significant improvement, it has a good balance between ideal and effort :slightly_smiling_face:

I did some additional test/debugging and you are right the checksum is calculated on the fly in local.go using

in = io.TeeReader(in, hasher)

and the issue is in fact that the stats counts at double speed compared to the actual transfer rate seen in Task Manager or when using --ignore-checksum.

So I suspect the root cause is the TeeReader making double reads to feed both pipes and therefore the stats reach 100% when the transfer is in reality only at 50%.

I therefore naively tried copying this approach from the crypt backend, but it didn't seem to help - perhaps you can easily spot the issue/solution with this additional info? (TeeReader is also used elsewhere, so there may be several places to fix)

I think we already made several significant improvements, so including them in 1.62 sounds great!

Great!

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.62

I think merging the checkers and transfers is worth doing but that can be for 1.63. I made an issue here: accounting: merge checkers and transfers · Issue #6797 · rclone/rclone · GitHub which is probably completely incomprehensible to anyone except me, but if you'd like to write extra stuff on there, that would be great!

I think the problem comes on the file we are reading not the file we are writing. Here is my demo

$ rclone --stats 0.25s --stats-one-line -v copy 1G :memory:
2023/03/01 11:21:39 INFO  :    54.996 MiB / 1 GiB, 5%, 0 B/s, ETA -
2023/03/01 11:21:39 INFO  :   110.996 MiB / 1 GiB, 11%, 0 B/s, ETA -
2023/03/01 11:21:40 INFO  :   165.996 MiB / 1 GiB, 16%, 0 B/s, ETA -
2023/03/01 11:21:40 INFO  :   221.996 MiB / 1 GiB, 22%, 221.884 MiB/s, ETA 3s
2023/03/01 11:21:40 INFO  :   276.996 MiB / 1 GiB, 27%, 221.884 MiB/s, ETA 3s
2023/03/01 11:21:40 INFO  :   330.996 MiB / 1 GiB, 32%, 221.884 MiB/s, ETA 3s
2023/03/01 11:21:41 INFO  :   386.996 MiB / 1 GiB, 38%, 221.884 MiB/s, ETA 2s
2023/03/01 11:21:41 INFO  :   448.227 MiB / 1 GiB, 44%, 221.884 MiB/s, ETA 2s
2023/03/01 11:21:41 INFO  :   485.996 MiB / 1 GiB, 47%, 224.056 MiB/s, ETA 2s
2023/03/01 11:21:41 INFO  :   545.996 MiB / 1 GiB, 53%, 224.056 MiB/s, ETA 2s
2023/03/01 11:21:42 INFO  :   560.289 MiB / 1 GiB, 55%, 224.056 MiB/s, ETA 2s
2023/03/01 11:21:42 INFO  :   628.996 MiB / 1 GiB, 61%, 224.056 MiB/s, ETA 1s
2023/03/01 11:21:42 INFO  :   688.996 MiB / 1 GiB, 67%, 209.621 MiB/s, ETA 1s
2023/03/01 11:21:42 INFO  :   700.367 MiB / 1 GiB, 68%, 209.621 MiB/s, ETA 1s
2023/03/01 11:21:43 INFO  :   755.996 MiB / 1 GiB, 74%, 209.621 MiB/s, ETA 1s
2023/03/01 11:21:43 INFO  :   813.996 MiB / 1 GiB, 79%, 203.467 MiB/s, ETA 1s
2023/03/01 11:21:43 INFO  :   873.996 MiB / 1 GiB, 85%, 203.467 MiB/s, ETA 0s
2023/03/01 11:21:43 INFO  :   900.996 MiB / 1 GiB, 88%, 203.467 MiB/s, ETA 0s
2023/03/01 11:21:44 INFO  :   962.996 MiB / 1 GiB, 94%, 203.467 MiB/s, ETA 0s
2023/03/01 11:21:44 INFO  :  1022.996 MiB / 1 GiB, 100%, 204.565 MiB/s, ETA 0s
2023/03/01 11:21:44 INFO  :         1 GiB / 1 GiB, 100%, 204.565 MiB/s, ETA 0s
2023/03/01 11:21:44 INFO  :         1 GiB / 1 GiB, 100%, 204.565 MiB/s, ETA 0s
2023/03/01 11:21:45 INFO  :         1 GiB / 1 GiB, 100%, 204.565 MiB/s, ETA 0s
2023/03/01 11:21:45 INFO  :         1 GiB / 1 GiB, 100%, 170.638 MiB/s, ETA 0s
2023/03/01 11:21:45 INFO  :         1 GiB / 1 GiB, 100%, 170.638 MiB/s, ETA 0s
2023/03/01 11:21:45 INFO  :         1 GiB / 1 GiB, 100%, 170.638 MiB/s, ETA 0s
2023/03/01 11:21:46 INFO  :         1 GiB / 1 GiB, 100%, 170.638 MiB/s, ETA 0s
2023/03/01 11:21:46 INFO  :         1 GiB / 1 GiB, 100%, 170.638 MiB/s, ETA 0s
2023/03/01 11:21:46 INFO  :         1 GiB / 1 GiB, 100%, 146.261 MiB/s, ETA 0s
2023/03/01 11:21:46 INFO  :         1 GiB / 1 GiB, 100%, 146.261 MiB/s, ETA 0s
2023/03/01 11:21:47 INFO  : 1G: Copied (new)
2023/03/01 11:21:47 INFO  :         1 GiB / 1 GiB, 100%, 146.261 MiB/s, ETA 0s

vs

$ rclone --stats 0.25s --stats-one-line -v copy 1G :memory: --ignore-checksum
2023/03/01 11:21:55 INFO  :   126.996 MiB / 1 GiB, 12%, 0 B/s, ETA -
2023/03/01 11:21:56 INFO  :   248.996 MiB / 1 GiB, 24%, 0 B/s, ETA -
2023/03/01 11:21:56 INFO  :   358.578 MiB / 1 GiB, 35%, 0 B/s, ETA -
2023/03/01 11:21:56 INFO  :   465.996 MiB / 1 GiB, 46%, 465.989 MiB/s, ETA 1s
2023/03/01 11:21:56 INFO  :   560.289 MiB / 1 GiB, 55%, 465.989 MiB/s, ETA 0s
2023/03/01 11:21:57 INFO  :   698.996 MiB / 1 GiB, 68%, 465.989 MiB/s, ETA 0s
2023/03/01 11:21:57 INFO  :   797.996 MiB / 1 GiB, 78%, 465.989 MiB/s, ETA 0s
2023/03/01 11:21:57 INFO  :   875.461 MiB / 1 GiB, 85%, 437.539 MiB/s, ETA 0s
2023/03/01 11:21:57 INFO  :  1019.996 MiB / 1 GiB, 100%, 437.539 MiB/s, ETA 0s
2023/03/01 11:21:57 INFO  : 1G: Copied (new)
2023/03/01 11:21:57 INFO  :         1 GiB / 1 GiB, 100%, 437.539 MiB/s, ETA 0s

I managed to work out what is going on here... The first part where transfer < 100% we are reading from the file and doing the MD5 hash simultaneously, hence the reduced transfer rate. The second part where transfer == 100% we are hashing the destination file.

So I think things are as expected here, though it would be nice if it said "hashing" for the second part where transfer == 100% and I think it will do once I've done the issue above.

Perfect!

Strange, I can reproduce your demo and I can reproduce my differing observations.

Perhaps I am somehow being fooled by the built-in Windows file caching, or we are seeing two different issues.

I will report back on the GitHub issue if I find a better way to replicate.

Agree!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.