How to utilize mod time and size instead checksums during sync and check?

Hello everyone!

I'm a new user of rclone. First of all, I would like to thank you all for working on such great software!

But going to the point. Maybe I'm missing something but I couldn't find an answer to those questions:

  1. Is there a way to sync ("rclone sync src_loc dst_loc") two directories using only modification time and size and NOT checksums? I have a lot of files and I reading all files and computing checksums is very slow in my case. I read in the documentation that comparing file mod time OR checksum is the default option. How to disable using checksum during sync but leave comparing by file mod time and file size?
  2. Is there a way to check ("rclone check src_loc dst_loc") two directories using only modification time and size and NOT checksums (documentation says that default mode use comparison by the checksum)? I have the same case as before - it takes a lot of time. I've read about "--size-only" option but it doesn't use file modification time. Is it possible to enforce checking on both size and mod time?

I use rclone on Windows to sync directories on network drive and with default settings and my commands look like this:
rclone sync "\my_network_disk\dir1" "\my_network_disk\dir2" --progress
rclone check "\my_network_disk\dir1" "\my_network_disk\dir2" --progress

Thank you in advance for any advice!

welcome to the forum,

rclone sync will compare size and mod time. have not seen it use checkums.
you can run a test using flag --dry-run and use a log file to understand what is going on.

not sure there is a reason to use rclone check after a rclone sync.
as sync does a check before copying a file.

For "rclone sync" I see that documentation says about checksum (MD5SUM): Doesn’t transfer unchanged files, testing by size and modification time or MD5SUM (https://rclone.org/commands/rclone_sync/) That's why I was little confused. But nonetheless, I'll try --dry-run and see what happens under the hood. Thanks for the tip :slight_smile:

not sure there is a reason to use rclone check after a rclone sync.
as sync does a check before copying a file.

I don't want to run check command just one after sync command, but I'm looking for having a quicker way to verify with high probability if two directories are identical.

i agree that the documention is confusing.
as i understand it and based on testing.

rclone sync will compare using only size and mod time, not checksum.
and produce output such as

KeePass.exe: Size and modification time the same (differ by 0s, within tolerance 100ns)
KeePass.exe: Unchanged skipping

where as
rclone sync --checksum will compare checksum and size

KeePass.exe: MD5 = 8d555802a67b5eab0ce0efaed2724cbf OK
KeePass.exe: Size and MD5 of src and dst objects identical
KeePass.exe: Unchanged skipping

and check out this flag.
https://rclone.org/docs/#backup-dir-dir

1 Like

That's great. Thank you for clarifying it. I really appreciate it :slight_smile:

So one question remains. It would be awesome If anyone could answer it :slight_smile:

based on the docs, it seems that there is no way to check with both size and mod time.
and just now, i did some tests.

i changed the mod time for keepass.exe
and then i ran rclone check

KeePass.exe: MD5 = 8d555802a67b5eab0ce0efaed2724cbf OK
KeePass.exe: OK

and then ran
rclone check --size-only

KeePass.exe: OK
KeePass.exe.config: OK

so in both cases, rclone did NOT check the mod-time

for what it is worth, i do not use rclone for non-cloud syncing and copying.
if you want to using some flag or feature unique to rclone then use it.
on windows, there are much better and more powerful tools such as robocopy and fastcopy.

1 Like

It's unfortunate. So I will try to find another tools which check also by mod time. Thank you for your time and effort :slight_smile: I'm marking this thread as solved :wink:

if i found out more info, i will update this post.

1 Like

i did more testing.

if you want check file based on mod-time, you can do this
rclone sync source dest --dry-run

and i got this output in the log

KeePass.exe: Modification times differ by 720h0m0.0006181s: 2019-04-04 08:02:42.689 -0400 EDT, 2019-05-04 08:02:42.6896181 -0400 EDT

1 Like

Copy and Sync use size/modtime by default.

felix@gemini:~$ rclone sync /etc/hosts GD: -vv
2020/01/25 15:19:47 DEBUG : rclone: Version "v1.50.2" starting with parameters ["rclone" "sync" "/etc/hosts" "GD:" "-vv"]
2020/01/25 15:19:47 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2020/01/25 15:19:47 DEBUG : hosts: Size and modification time the same (differ by -519.988µs, within tolerance 1ms)
2020/01/25 15:19:47 DEBUG : hosts: Unchanged skipping
2020/01/25 15:19:47 INFO  :
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 1 / 1, 100%
Transferred:            0 / 0, -
Elapsed time:          0s

2020/01/25 15:19:47 DEBUG : 5 go routines active
2020/01/25 15:19:47 DEBUG : rclone: Version "v1.50.2" finishing with parameters ["rclone" "sync" "/etc/hosts" "GD:" "-vv"]
felix@gemini:~$ rclone sync /etc/hosts GD: -vv --checksum
2020/01/25 15:19:55 DEBUG : rclone: Version "v1.50.2" starting with parameters ["rclone" "sync" "/etc/hosts" "GD:" "-vv" "--checksum"]
2020/01/25 15:19:55 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2020/01/25 15:19:56 DEBUG : hosts: MD5 = 9da946a118249403281f7ad1e178277c OK
2020/01/25 15:19:56 DEBUG : hosts: Size and MD5 of src and dst objects identical
2020/01/25 15:19:56 DEBUG : hosts: Unchanged skipping
2020/01/25 15:19:56 INFO  :
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 1 / 1, 100%
Transferred:            0 / 0, -
Elapsed time:          0s

2020/01/25 15:19:56 DEBUG : 5 go routines active
2020/01/25 15:19:56 DEBUG : rclone: Version "v1.50.2" finishing with parameters ["rclone" "sync" "/etc/hosts" "GD:" "-vv" "--checksum"]

You have to turn on checksum.

rclone check specifically validates the size and checksum. You can remove the checksum and just check size only. If the goal is to validate, just re-run the copy or sync and it would check the size/modtime and not copy anything.

If you can see it doing checksums, can you share the -vv output?

Yes I think we all agree about those observations. Let me summarize it:

  1. rclone sync command by default uses size and mod time
  2. rclone sync with --checksum command uses size and MD5SUM
  3. rclone check command by default uses size and checksum
  4. rclone check with --size-only command use size only and nothing more
  5. there is no way to tell rclone to check using both size and mod time and at the same time NOT using checksum

So basing on these points I think actual behavior 100% matches with what is described in documentation :slight_smile:

Do you mean running rclone sync or rclone copy with --dry-run flag? I think it could work. I'll check that next time :slight_smile:

If your goal is to validate modtime/size, you can just use dry-run or even just let it run again as I am not sure what the goal is.

The use for rclone check is to validate a bit more and it uses (assuming a backend supports it) a checksum to validate that.

It depends on how much you want to validate the data on the other side is what you'd expect. For me, I am moving media so I don't care much and if there are no errors, I assume the copy was good.

If you run back to back syncs and the second has no errors, all the files are in good shape from a modtime/size so that would be victory from what you are describing.

I'm in a hurry to explain. I have two directories (these has different path). I want an answer to the question "are those directories contents equal?" I don't want to modify either directory. I want the most accurate answer but it would be best if I can get an answer without reading files contents.

words like equal, accurate and best are fuzzy words with no real meaning.
it all depends on your use case.

the only way to know with certainty is to checksum the files.
any other solution is just a set of compromises.

for your use case, i think this is the best
rclone sync source dest --dry-run

Actually I didn't wanted to be precise, because I wanted to communicate rough idea what was my goal without going into too much details which obscure the bigger picture.

Ok, I'll rephrase it. I have two directories. I want to know if those two directories "look" the same By "looks" I mean: "do whatever you need on each file but don't read more than 5KB per each file and yes I'm aware of it will be not 100% acurate answer". And yes I'm accepting compromises, I know it won't be 100% accurate answer. My case is that my disks are really slow and I don't want to read all the data from each file because it's a slow process. By "slow" I mean time is comparable to "rclone sync" to empty directory on the same data volumes. Yes, I can buy faster disks and computers, etc, etc... But I don't want to do it (for various reasons).

To be precise checksum also doesn't give any guarantee that files are equal and it's also some kind of compromise :stuck_out_tongue_winking_eye: There can be hash collisions. The only byte to byte comparison gives us 100% sure answer. But it's offtopic and it's not important in this discussion :slight_smile:

Yes, I think that fulfills my needs. Thank you for help :slight_smile:

You could also use rclone lsf to just check the things you are interested in. So if you are interested in size and modtime you can list like this then do a diff

rclone lsf --files-only -R --csv --format pst dir1 | sort > dir1-files
rclone lsf --files-only -R --csv --format pst dir2 | sort > dir2-files
diff dir1 dir2

The clever thing here is the --csv --format pst which means the output is a CSV file with three fields p = path, s = size and t = time.

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.