S3 takes too long, --checksum misses Excel files


#1

I’ve been using rclone on Linux now for almost two years, and daily syncing my data to the cloud gives me a nice sense of security. But I encountered something that caused me a little panic.

I sync to B2 and S3 nightly for good redundancy. I happened to pull a specific Excel file off S3 and it appeared to be missing some entries…slight panic…I check the copy on my network drive and it’s all there. I check B2 and it’s all there.

So I look at the cron commands that I’ve setup to do the sync and the only difference is S3 is using --checksum. After some recall and experimentation I remember why I did this. S3 takes much longer than B2, and produces far more transactions.

First question, why does it miss changes in Excel files with --checksum? (sorry I know little about the inner workings of this) And second question, is there any way to reduce the amount of transactions that occur with S3 during sync?

I’m using the newest 64 bit Linux executable. Just downloaded the new one this morning before posting as it was 1.3x previously.


#2

Without checksum it is using size. If the sizes match it assumes the file is ‘good’. You can consider a less frequent “checksum” sync in addition to your sync without checksums to catch this periodically. Or you’d need to just switch the main sync to use checksum if this is a problem for you.


#3

The sizes do match so that part makes sense, but my interpretation is “–checksum” uses the checksum instead of mod-time. Here’s the definition of that command line switch:

Skip based on checksum & size, not mod-time & size

I’m not sure what checksum this is, e.g. a checksum all files have, or an rclone generated checksum, but I’d think if the file is different at all, the checksum shouldn’t match.

Is the B2 protocol just that much better than S3 in keeping the number of transactions low?


#4

Sorry, I misunderstood. I thought you were disabling checksum. Are you syncing the contents of a crypt? If so, crypt doesn’t store checksums so would default to size unless you explicately enable it (like you did) which would in turn download the file to check its checksum before syncing/

If it isn’t a crypt then you can run a rclone check -v to see whats going on for the file(s) in question.


#5

It is crypt, yes. So by default it must then check mod-time and size? Because it catches the difference if I don’t add any extraneous switches. It just runs up a large number of transactions.

which would in turn download the file to check its checksum before syncing

If you mean it downloads the entire file, I know that can’t be happening as it takes about 5 minutes on 180GB share over an internet connection of 50/5Mb.

Maybe I could use --ignore-size and let it check mod-time only? Would that significantly reduce transactions? I wish it was easy to find the transaction count on S3. I used to know where it is!


#6

You must be syncing to s3 outside of crypt then and syncing the actual crypted files. If you do that, it can check the checksums. It is within the crypt that it cannot check the checksums.

You may want to share your full commands and try the check/cryptcheck functions as those will give you more information.


#7

I’m syncing the crypt, “S3”, which is “type = crypt” and “remote = S3_BASE:”. “S3_BASE” is “type = s3”.

The command is fairly basic:

/usr/sbin/rclone sync --transfers=3 --bwlimit “05:00,400K 22:00,700K” -v /NetworkShare/backup/ S3:

That will do a perfect sync. Once I add --checksum it syncs much faster but misses some excel files and possibly others.


#8

Maybe this will help you understand what is happening.

Without checksum:

notebook2:/data/bin$ rclone sync robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ --checkers=8  --include=*.nfo -vv --dry-run  2>&1 | egrep -v "Excluded| OK" 
2019/01/09 13:01:15 DEBUG : rclone: Version "v1.45-034-gc1dd7678-beta" starting with parameters ["rclone" "sync" "robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "--checkers=8" "--include=*.nfo" "-vv" "--dry-run"]
2019/01/09 13:01:15 DEBUG : Using config file from "/home//.rclone.conf"
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E01.1080p.webdl.h264.nfo: Modification times differ by -202h2m10.534s: 2019-01-09 15:11:05.937 +0000 UTC, 2019-01-01 05:08:55.403 +0000 UTC
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E01.1080p.webdl.h264.nfo: Not copying as --dry-run
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E02.1080p.webdl.h264.nfo: Modification times differ by -202h2m9.198s: 2019-01-09 15:11:08.021 +0000 UTC, 2019-01-01 05:08:58.823 +0000 UTC
2019/01/09 13:01:30 INFO  : Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': Waiting for checks to finish
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E03.1080p.webdl.h264.nfo: Modification times differ by -202h2m7.822s: 2019-01-09 15:11:09.889 +0000 UTC, 2019-01-01 05:09:02.067 +0000 UTC
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E02.1080p.webdl.h264.nfo: Not copying as --dry-run
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E03.1080p.webdl.h264.nfo: Not copying as --dry-run
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E04.1080p.webrip.h264.dd+5.1.nfo: Modification times differ by -202h2m5.643s: 2019-01-09 15:11:11.625 +0000 UTC, 2019-01-01 05:09:05.982 +0000 UTC
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E06.1080p.webrip.h264.nfo: Sizes differ (src 3396 vs dst 3390)
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E05.720p.webdl.h264.dd+5.1.nfo: Modification times differ by -202h2m3.571s: 2019-01-09 15:11:13.413 +0000 UTC, 2019-01-01 05:09:09.842 +0000 UTC
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E05.720p.webdl.h264.dd+5.1.nfo: Not copying as --dry-run
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E06.1080p.webrip.h264.nfo: Not copying as --dry-run
2019/01/09 13:01:30 DEBUG : Escape.at.Dannemora.S01E07.1080p.webrip.h264.dd+5.1.nfo: Sizes differ (src 3471 vs dst 3461)
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E04.1080p.webrip.h264.dd+5.1.nfo: Not copying as --dry-run
2019/01/09 13:01:30 INFO  : Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': Waiting for transfers to finish
2019/01/09 13:01:30 NOTICE: Escape.at.Dannemora.S01E07.1080p.webrip.h264.dd+5.1.nfo: Not copying as --dry-run
2019/01/09 13:01:30 INFO  : Waiting for deletions to finish
2019/01/09 13:01:30 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 7 / 7, 100%
Transferred:            7 / 7, 100%
Elapsed time:       15.4s

With checksum:

notebook2:/data/bin$ rclone sync robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ --checkers=8  --include=*.nfo -vv --checksum --dry-run  2>&1 | egrep -v "Excluded| OK"
2019/01/09 13:06:04 DEBUG : rclone: Version "v1.45-034-gc1dd7678-beta" starting with parameters ["rclone" "sync" "robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "--checkers=8" "--include=*.nfo" "-vv" "--checksum" "--dry-run"]
2019/01/09 13:06:04 DEBUG : Using config file from "/home//.rclone.conf"
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E01.1080p.webdl.h264.nfo: Size of src and dst objects identical
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E01.1080p.webdl.h264.nfo: Unchanged skipping
2019/01/09 13:06:11 INFO  : Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': Waiting for checks to finish
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E02.1080p.webdl.h264.nfo: Size of src and dst objects identical
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E02.1080p.webdl.h264.nfo: Unchanged skipping
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E03.1080p.webdl.h264.nfo: Size of src and dst objects identical
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E03.1080p.webdl.h264.nfo: Unchanged skipping
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E05.720p.webdl.h264.dd+5.1.nfo: Size of src and dst objects identical
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E05.720p.webdl.h264.dd+5.1.nfo: Unchanged skipping
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E04.1080p.webrip.h264.dd+5.1.nfo: Size of src and dst objects identical
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E06.1080p.webrip.h264.nfo: Sizes differ (src 3396 vs dst 3390)
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E04.1080p.webrip.h264.dd+5.1.nfo: Unchanged skipping
2019/01/09 13:06:11 DEBUG : Escape.at.Dannemora.S01E07.1080p.webrip.h264.dd+5.1.nfo: Sizes differ (src 3471 vs dst 3461)
2019/01/09 13:06:11 INFO  : Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': Waiting for transfers to finish
2019/01/09 13:06:11 NOTICE: Escape.at.Dannemora.S01E06.1080p.webrip.h264.nfo: Not copying as --dry-run
2019/01/09 13:06:11 NOTICE: Escape.at.Dannemora.S01E07.1080p.webrip.h264.dd+5.1.nfo: Not copying as --dry-run
2019/01/09 13:06:11 INFO  : Waiting for deletions to finish
2019/01/09 13:06:11 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:                 7 / 7, 100%
Transferred:            2 / 2, 100%
Elapsed time:        7.4s

Cryptcheck

notebook2:/data/bin$ rclone cryptcheck robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/ --checkers=8  --include=*.nfo -vv   2>&1 | egrep -v "Excluded| OK"
2019/01/09 12:58:35 DEBUG : rclone: Version "v1.45-034-gc1dd7678-beta" starting with parameters ["rclone" "cryptcheck" "robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/" "--checkers=8" "--include=*.nfo" "-vv"]
2019/01/09 12:58:35 DEBUG : Using config file from "/home//.rclone.conf"
2019/01/09 12:58:44 INFO  : Using MD5 for hash comparisons
2019/01/09 12:58:44 INFO  : Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': Waiting for checks to finish
2019/01/09 12:58:46 ERROR : Escape.at.Dannemora.S01E03.1080p.webdl.h264.nfo: hashes differ (pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/) "ef69e8e0c40be6e7c8d6bd546371517c" vs (robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/) "9436164d4bbfa789f4603d4675b8eb52"
2019/01/09 12:58:48 ERROR : Escape.at.Dannemora.S01E05.720p.webdl.h264.dd+5.1.nfo: hashes differ (pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/) "2d04d9fe505683c2f02d80b66e975083" vs (robgs-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/) "53b9786ebb3cdff90723251f95d2331e"
2019/01/09 12:58:48 ERROR : Escape.at.Dannemora.S01E06.1080p.webrip.h264.nfo: Sizes differ
2019/01/09 12:58:48 ERROR : Escape.at.Dannemora.S01E07.1080p.webrip.h264.dd+5.1.nfo: Sizes differ
2019/01/09 12:58:48 NOTICE: Encrypted drive 'pinagd-cryptp:Media/Videos/Series/Escape.at.Dannemora/Season.1/': 4 differences found
2019/01/09 12:58:48 Failed to cryptcheck: 4 differences found

Without checksum on a crypt it is looking at the modification times and size to determine a dirty file.
With checksum on a crypt it is looking at size and checksum (but the checksums do not exist on crypt so it is just size)

If you look at the cryptcheck results you can see what differences there actually are.

So to answer your questions.

When you specify checksum and they aren’t there, it uses just the size component.

I believe you can try --fast-list. By adding checksum you’ve effectively disabled rclone from grabbing times which according to the docs requires a transaction.

https://rclone.org/s3/#update-and-use-server-modtime


#9

The command line of rclone could benefit from updating the comment for checksum

-c, --checksum Skip based on checksum & size, not mod-time & size

to

-c, --checksum Skip based on checksum & size, not mod-time & size. If checksums don't exist(crypt), then just size.

Also I was wrong about downloading the file. I thought this was implemented already.


#10

I believe you can try --fast-list

That seems like a great thing to try. RAM isn’t an issue on this computer so that’s a no-brainer for me. Thanks for your help!


#11

btw, I opened this issue to try to make this behavior more clear.