'rclone checksum' with '--min-age' or '--max-age' incorrect behavior

What is the problem you are having with rclone?

the behavior of running 'rclone sumcheck' with '--min-age' or '--max-age' seems incorrect (note: for the demonstration I used a data directory which contains only one file named "aaa.txt" that has been created a few seconds before running the commands):

  • when running 'rclone sumcheck' without '--min-age' or '--max-age' the behavior is correct:

    • combined file shows: "= aaa.txt"
  • when running 'rclone sumcheck' with '--min-age 1d' or '--max-age 1ns' I expect rclone to simply ignore the file but instead the behavior is such:

    • combined file shows: "+ aaa.txt"
    • rclone with '-vv' reports "1 files missing, 1 differences found, 1 errors while checking" and "Errors: 2 (retrying may help)"

Run the command 'rclone version' and share the full output of the command.

rclone v1.66.0
- os/version: Microsoft Windows 10 Pro 22H2 (64 bit)
- os/kernel: 10.0.19045.3693 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.22.1
- go/linking: static
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

none

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone hashsum md5 "C:\TEST" --output-file "DATA/checksums.txt"

rclone checksum md5 "DATA/checksums.txt" "C:\TEST" --differ "DATA/diff_modified.txt" --missing-on-dst "DATA/diff_miss_dst.txt" --missing-on-src "DATA/diff_miss_src.txt" --combined "DATA/diff_all.txt" -–log-file "DATA/log.txt" -vv

rclone checksum md5 "DATA/checksums.txt" "C:\TEST" --differ "DATA/diff_modified.txt" --missing-on-dst "DATA/diff_miss_dst.txt" --missing-on-src "DATA/diff_miss_src.txt" --combined "DATA/diff_all.txt" -–log-file "DATA/log.txt" -vv --min-age 1d

rclone checksum md5 "DATA/checksums.txt" "C:\TEST" --differ "DATA/diff_modified.txt" --missing-on-dst "DATA/diff_miss_dst.txt" --missing-on-src "DATA/diff_miss_src.txt" --combined "DATA/diff_all.txt" -–log-file "DATA/log.txt" -vv --max-age 1ns

The rclone config contents with secrets removed.

empty (no config)

A log from the command with the -vv flag

2023/12/06 23:38:52 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv"]
2023/12/06 23:38:53 DEBUG : Creating backend with remote "DATA/checksums.txt"
2023/12/06 23:38:53 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2023/12/06 23:38:53 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2023/12/06 23:38:53 DEBUG : Creating backend with remote "C:\\TEST"
2023/12/06 23:38:53 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2023/12/06 23:38:53 DEBUG : aaa.txt: md5 = 96a3be3cf272e017046d1b2674a52bd3 OK
2023/12/06 23:38:53 NOTICE: Local file system at //?/C:/TEST: 0 differences found
2023/12/06 23:38:53 NOTICE: Local file system at //?/C:/TEST: 1 matching files
2023/12/06 23:38:53 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.1s

2023/12/06 23:38:53 DEBUG : 2 go routines active
2023/12/06 23:39:03 DEBUG : --min-age 1d to 2023-12-05 23:39:03.8735471 +0000 UTC m=-86399.789625799
2023/12/06 23:39:03 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv" "--min-age" "1d"]
2023/12/06 23:39:03 DEBUG : Creating backend with remote "DATA/checksums.txt"
2023/12/06 23:39:03 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2023/12/06 23:39:03 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2023/12/06 23:39:03 DEBUG : Creating backend with remote "C:\\TEST"
2023/12/06 23:39:03 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2023/12/06 23:39:03 DEBUG : aaa.txt: Excluded (ModTime Filter)
2023/12/06 23:39:03 DEBUG : aaa.txt: Excluded
2023/12/06 23:39:03 ERROR : aaa.txt: file not in Local file system at //?/C:/TEST
2023/12/06 23:39:03 NOTICE: Local file system at //?/C:/TEST: 1 files missing
2023/12/06 23:39:03 NOTICE: Local file system at //?/C:/TEST: 1 differences found
2023/12/06 23:39:03 NOTICE: Local file system at //?/C:/TEST: 1 errors while checking
2023/12/06 23:39:03 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Errors:                 2 (retrying may help)
Elapsed time:         0.1s

2023/12/06 23:39:03 DEBUG : 2 go routines active
2023/12/06 23:39:03 Failed to checksum with 2 errors: last error was: file not in Local file system at //?/C:/TEST
2023/12/06 23:39:10 DEBUG : --max-age 1ns to 2023-12-06 23:39:10.851909299 +0000 UTC m=+0.211457000
2023/12/06 23:39:10 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv" "--max-age" "1ns"]
2023/12/06 23:39:10 DEBUG : Creating backend with remote "DATA/checksums.txt"
2023/12/06 23:39:10 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2023/12/06 23:39:10 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2023/12/06 23:39:10 DEBUG : Creating backend with remote "C:\\TEST"
2023/12/06 23:39:10 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2023/12/06 23:39:10 DEBUG : aaa.txt: Excluded (ModTime Filter)
2023/12/06 23:39:10 DEBUG : aaa.txt: Excluded
2023/12/06 23:39:10 ERROR : aaa.txt: file not in Local file system at //?/C:/TEST
2023/12/06 23:39:10 NOTICE: Local file system at //?/C:/TEST: 1 files missing
2023/12/06 23:39:10 NOTICE: Local file system at //?/C:/TEST: 1 differences found
2023/12/06 23:39:10 NOTICE: Local file system at //?/C:/TEST: 1 errors while checking
2023/12/06 23:39:10 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Errors:                 2 (retrying may help)
Elapsed time:         0.1s

2023/12/06 23:39:10 DEBUG : 2 go routines active
2023/12/06 23:39:10 Failed to checksum with 2 errors: last error was: file not in Local file system at //?/C:/TEST

Is it possible that your computer's clock is set incorrectly? The logs suggest that your machine thinks today's date is 2023/12/06 which, unless I'm extremely confused, was several months ago.

Another possibility is that these are old logs, but I don't think so, as rclone v1.66.0 did not yet exist on that date.

It could be relevant, as --max-age and --min-age on local will be relative to whatever your machine thinks is the current date and time.

1 Like

The machine clock is indeed set to an earlier date because it's a virtual machine.

However this bug exists regardless because all the actions I performed were done inside the virtual machine (that is for the demonstration of the bug - I created a test folder and within it a test file called "aaa.txt" and immediately after ran the mentioned commands -- all inside the virtual machine).

You can also see in the output that the '--min-age' and the '--max-age' where converted to the relative dates of the virtual machine (since '--min-age' and '--max-age' are relative.

So unfortunately this isn't the reason.

So we can be sure I synced the clock of the machine to the current time and tried it again. I can confirm that the bug still exists - this is the output:

2024/03/26 21:40:32 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv"]
2024/03/26 21:40:32 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/26 21:40:32 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/26 21:40:32 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/26 21:40:32 DEBUG : Creating backend with remote "C:\\TEST"
2024/03/26 21:40:32 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2024/03/26 21:40:32 DEBUG : aaa.txt: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2024/03/26 21:40:32 NOTICE: Local file system at //?/C:/TEST: 0 differences found
2024/03/26 21:40:32 NOTICE: Local file system at //?/C:/TEST: 1 matching files
2024/03/26 21:40:32 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.1s

2024/03/26 21:40:32 DEBUG : 2 go routines active
2024/03/26 21:40:57 DEBUG : --min-age 1d to 2024-03-25 21:40:57.2493166 +0000 UTC m=-86399.812718899
2024/03/26 21:40:57 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv" "--min-age" "1d"]
2024/03/26 21:40:57 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/26 21:40:57 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/26 21:40:57 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/26 21:40:57 DEBUG : Creating backend with remote "C:\\TEST"
2024/03/26 21:40:57 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2024/03/26 21:40:57 DEBUG : aaa.txt: Excluded (ModTime Filter)
2024/03/26 21:40:57 DEBUG : aaa.txt: Excluded
2024/03/26 21:40:57 ERROR : aaa.txt: file not in Local file system at //?/C:/TEST
2024/03/26 21:40:57 NOTICE: Local file system at //?/C:/TEST: 1 files missing
2024/03/26 21:40:57 NOTICE: Local file system at //?/C:/TEST: 1 differences found
2024/03/26 21:40:57 NOTICE: Local file system at //?/C:/TEST: 1 errors while checking
2024/03/26 21:40:57 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Errors:                 2 (retrying may help)
Elapsed time:         0.1s

2024/03/26 21:40:57 DEBUG : 2 go routines active
2024/03/26 21:40:57 Failed to checksum with 2 errors: last error was: file not in Local file system at //?/C:/TEST
2024/03/26 21:41:01 DEBUG : --max-age 1ns to 2024-03-26 21:41:01.188937299 +0000 UTC m=+0.198870100
2024/03/26 21:41:01 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "C:\\TEST" "--differ" "DATA/diff_modified.txt" "--missing-on-dst" "DATA/diff_miss_dst.txt" "--missing-on-src" "DATA/diff_miss_src.txt" "--combined" "DATA/diff_all.txt" "--log-file" "DATA/log.txt" "-vv" "--max-age" "1ns"]
2024/03/26 21:41:01 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/26 21:41:01 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/26 21:41:01 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/26 21:41:01 DEBUG : Creating backend with remote "C:\\TEST"
2024/03/26 21:41:01 DEBUG : fs cache: renaming cache item "C:\\TEST" to be canonical "//?/C:/TEST"
2024/03/26 21:41:01 DEBUG : aaa.txt: Excluded (ModTime Filter)
2024/03/26 21:41:01 DEBUG : aaa.txt: Excluded
2024/03/26 21:41:01 ERROR : aaa.txt: file not in Local file system at //?/C:/TEST
2024/03/26 21:41:01 NOTICE: Local file system at //?/C:/TEST: 1 files missing
2024/03/26 21:41:01 NOTICE: Local file system at //?/C:/TEST: 1 differences found
2024/03/26 21:41:01 NOTICE: Local file system at //?/C:/TEST: 1 errors while checking
2024/03/26 21:41:01 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Errors:                 2 (retrying may help)
Elapsed time:         0.1s

2024/03/26 21:41:01 DEBUG : 2 go routines active
2024/03/26 21:41:01 Failed to checksum with 2 errors: last error was: file not in Local file system at //?/C:/TEST

I think the actual problem here is that sumfiles don't have modtimes -- they only have a path and a hash. So there is no way to filter them out on the source side based on age.

Therefore, the real file on the dest side is getting correctly filtered out, but since we don't know the modtime of the file on the source side, all we can conclude is that the file is present on the source and missing on the dest.

So, I think the behavior is correct. Let me know if I'm missing something?

I can see what you mean and with your explanation in mind I understand why it happens with 'rclone checksum' and why it works well with other commands like 'rsync check' and 'rsync copy'.

However to me personally, these semantics seem off for 'rclone checksum' as I'd expect the files to be filtered out entirely and not reported as missing (as this command is mostly used to check for data integrity and not sync files).

I guess it's up for implementation, although in that case it's not very intuitive so maybe should be explained in the docs for 'rclone checksum'...

It sounds like in your particular case, if we had a way of knowing the source's modtime, we would find that it matches the dest. But what if it didn't? Without a way of knowing the modtime of the source at the moment the SUM file snapshot was taken, both possibilities are equally likely. I think there are data integrity use cases for the current behavior -- for example, suppose the SUM file snapshot was taken a long time ago, and we want to use --min-age to verify that files haven't been modified recently? In that scenario, one would want this reported as a difference, not silently ignored.

Some ideas for new features / improvements:

  • Perhaps --min-age and --max-age should log a warning when at least one side doesn't support modtime.
  • Perhaps there should be a new flag to make the assumption that the unknown side's modtime matches the known side for the purposes of age filters
  • Perhaps the SUM file format should be expanded to optionally support modtime (and size)

See also this proposed change:

One other suggestion that occurs to me in your particular case -- perhaps you should also include the age filter in your hashsum command when generating the SUM file? That way it should be excluded from the source side as well. It may not make sense depending on what you're using this for, and when -- but something to consider.

Feel free to submit a PR with the proposed doc fix!

Thanks for the suggestions.
I really like all your ideas for new features / improvements.
In particular the 2nd and the 3rd points.

The way I see it there are two ways to check for data integrity:

  1. compare the checksums of different data copies (the issue is that this takes a lot of time if you have a lot of data and multiple data copies) - rclone supports this case very well.
  2. compare the checksums of the current state of one data copy and the past checksums of the same data copy (the benefit here is that this is faster so it can be done more often) - I've tried multiple ways to achieve this with rclone (including using the hasher functionality which currently seems to be lacking) and unless I missed something there's no "smooth" way to achieve it since each method I've tried such as using 'rclone checksum' with '--min-age' presents its own issues. I think your 2nd and even more so your 3rd ideas are great since they would really allow to achieve this.

If I understand your suggestion correctly it won't help in my case because what I do is generate the SUM file for my main data copy, then the data is being used (creating new files, modifying existing files) for some period (say two weeks) and then after that time period I'd like to validate the data integrity by comparing the current checksums against the checksums in the originally generated SUM file (what the '--min-age' achieve is to filter out the files that were created and modified since the creation of the SUM file).

However, what I do right now is something similar where I use the 'rclone hashsum' with '--min-age' not the first time when I'm generating the SUM file but later on subsequent times and then manually compare the files for differences using vscode (usually no differences so it works fine). That's why I wanted the 'rclone checksum' with '--min-age' to work so there would be no need for the manual comparison step :slight_smile:

I think I maybe just thought of a way you could achieve what you are looking for: first generate a list of all the files that match the age filter, then use this list as a --files-from filter for your check.

Example:

rclone lsf --absolute --files-only --max-age 1d "C:\TEST" > filelist.txt
rclone checksum md5 "DATA/checksums.txt" "C:\TEST" --files-from-raw filelist.txt --differ "DATA/diff_modified.txt" --missing-on-dst "DATA/diff_miss_dst.txt" --missing-on-src "DATA/diff_miss_src.txt" --combined "DATA/diff_all.txt" -–log-file "DATA/log.txt" -vv

Since the second command filters based on filename instead of modtime, I think it should work on both sides.

Note that this does make the assumption that all of the source files exist on the dest. If this is a problem, you might consider reversing the logic and excluding the files you definitely don't care about.

Another suggestion is to try the --one-way flag:

If you supply the `--one-way` flag, it will only check that files in the source match the files in the destination, not the other way around. This means that extra files in the destination that are not in the source will not be detected.

This will achieve the "created" part but not the "modified" part. (To be honest, I'm still sort of struggling to understand why you want to ignore modified files... if a file was modified but shouldn't have been, wouldn't you want to know that?)

I agree the 3rd idea especially would be useful for all sorts of reasons. Would you like to submit a Feature Request on GitHub?

Great idea!
I wasn't familiar with the 'lsf' command and the --files-from-raw' filter.

I've tried it and it works with a slight modification (added '-R' for recursive and changed '--max-age' to '--min-age':

rclone lsf -R --absolute --files-only --min-age 1d "C:\TEST" > filelist.txt
rclone checksum md5 "DATA/checksums.txt" "C:\TEST" --files-from-raw filelist.txt --differ "DATA/diff_modified.txt" --missing-on-dst "DATA/diff_miss_dst.txt" --missing-on-src "DATA/diff_miss_src.txt" --combined "DATA/diff_all.txt" -–log-file "DATA/log.txt" -vv

I've tried it but it doesn't work - I believe that the '--one-way' needs to be in the reverse direction for it to work... Is there such reverse '--one-way' functionality?

I'm mainly interested in detecting data degradation in which case the file's date and size are not modified (only the checksum).

I have a little issue with my GitHub user which will probably be resolved soon (long story) and then I'd be happy to.

Good catch -- you are right, -R is important!

Yes and no... with rclone check it is possible as you can just switch the src and dst args, but with rclone checksum the sumfile is always the src, so it's not really possible to go the other direction. That would be another good feature request -- I don't think it would be too hard to implement this.

That said, I think it is already the direction you would want. You aren't interested in files that aren't in your sumfile, right?

One other thought is to use --missing-on-dst / --missing-on-src and just discard whichever one you don't want.

Ok, I think I understand what you're getting at now. The --files-from method is probably the best option for now, and a feature request for a better one would be helpful.

Note also that if bit rot is your main concern, using --download can be a more foolproof method than checking stored hashes, as some platforms simply return whatever hash was stored at the time of upload (i.e. they won't necessarily catch bit rot that happened later.) There's an interesting discussion about that here and here.

Great info, I appreciate it thanks!

1 Like