Sync to previous backup made with rsync

What is the problem you are having with rclone?

I have a previous backup on B2 made with rsync. I want to migrate to rclone and avoid re-uploading the data as much as possible. However, the --dry-run output from the sync command seems to indicate that it would update modification times on some files, but copy others, even though they haven't changed.

Using rclone v1.52.2 on Ubuntu 16.04LTS VM with 1.5G RAM.

rclone --dry-run --b2-chunk-size=40M --b2-upload-cutoff=200M --fast-list --progress --transfers 16 --include="pb-0*.zip" check /mnt/Photos/5DayDeal b2:qnap-media-sync/Photos/5DayDeal

These are the files on the remote end:

  3637511 2019-02-09 04:03:29.000000000 pb-01-freebies-and-bundle-info.zip
 55824226 2019-02-09 04:03:36.000000000 pb-02-portrait-shoot-playbook-tamara-lackey.zip
116426426 2019-02-09 04:03:34.000000000 pb-03-travel-pro-kit-viktor-elizarov.zip
 40543464 2019-02-09 04:03:32.000000000 pb-04-artistry3-actions-dave-seeram.zip
 66689102 2019-02-09 04:03:31.000000000 pb-05-lightroom-mastery-ebook-contrastly.zip
573478665 2019-02-09 04:04:53.000000000 pb-06-art-of-black-and-white-andrew-gibson.zip
989713410 2019-02-09 04:05:21.000000000 pb-07-creativity-on-budget-lindsay-adler-1-of-2.zip
1184532245 2019-02-09 04:07:40.000000000 pb-08-creativity-on-budget-lindsay-adler-2-of-2.zip
1625542099 2019-02-09 04:08:44.000000000 pb-09-photo-101-zack-arias-1-of-2.zip
2020/06/24 18:32:22 DEBUG : rclone: Version "v1.52.2" starting with parameters ["rclone" "-vv" "--dry-run" "--b2-chunk-size=40M" "--b2-upload-cutoff=200M" "--fast-list" "--progress" "--transfers" "16" "--include=pb-0*.zip" "sync" "/mnt/Photos/5DayDeal" "b2:qnap-media-sync/Photos/5DayDeal"]
2020/06/24 18:32:22 DEBUG : Using config file from "/home/andrei/.config/rclone/rclone.conf"
2020-06-24 18:32:22 DEBUG : pb-01-freebies-and-bundle-info.zip: Modification times differ by 7h14m55.3635648s: 2019-02-08 20:48:33.6364352 -0600 CST, 2019-02-09 10:03:29 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-02-portrait-shoot-playbook-tamara-lackey.zip: Modification times differ by 7h14m59.0105647s: 2019-02-08 20:48:36.9894353 -0600 CST, 2019-02-09 10:03:36 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-03-travel-pro-kit-viktor-elizarov.zip: Modification times differ by 7h14m50.3825645s: 2019-02-08 20:48:43.6174355 -0600 CST, 2019-02-09 10:03:34 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-04-artistry3-actions-dave-seeram.zip: Modification times differ by 7h14m51.9045646s: 2019-02-08 20:48:40.0954354 -0600 CST, 2019-02-09 10:03:32 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-06-art-of-black-and-white-andrew-gibson.zip: Modification times differ by 7h15m22.6315634s: 2019-02-08 20:49:30.3684366 -0600 CST, 2019-02-09 10:04:53 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-07-creativity-on-budget-lindsay-adler-1-of-2.zip: Modification times differ by 7h14m31.8875616s: 2019-02-08 20:50:49.1124384 -0600 CST, 2019-02-09 10:05:21 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-05-lightroom-mastery-ebook-contrastly.zip: Modification times differ by 7h14m53.5475647s: 2019-02-08 20:48:37.4524353 -0600 CST, 2019-02-09 10:03:31 +0000 UTC
2020-06-24 18:32:22 DEBUG : pb-08-creativity-on-budget-lindsay-adler-2-of-2.zip: Modification times differ by 7h16m46.4865614s: 2019-02-08 20:50:53.5134386 -0600 CST, 2019-02-09 10:07:40 +0000 UTC
2020-06-24 18:32:22 DEBUG : B2 bucket qnap-media-sync path Photos/5DayDeal: Waiting for checks to finish
2020-06-24 18:32:22 DEBUG : pb-01-freebies-and-bundle-info.zip: SHA-1 = 8af0d7ea0d28bf600f49fc252363d814bc04931a OK
2020-06-24 18:32:22 NOTICE: pb-01-freebies-and-bundle-info.zip: Not updating modification time as --dry-run
2020-06-24 18:32:22 DEBUG : pb-01-freebies-and-bundle-info.zip: Unchanged skipping
2020-06-24 18:32:22 DEBUG : pb-09-photo-101-zack-arias-1-of-2.zip: Modification times differ by 7h16m57.2315602s: 2019-02-08 20:51:46.7684398 -0600 CST, 2019-02-09 10:08:44 +0000 UTC
2020-06-24 18:32:23 DEBUG : pb-04-artistry3-actions-dave-seeram.zip: SHA-1 = 8c85ca09e789219b893b81812ec87254a29f1c6a OK
2020-06-24 18:32:23 NOTICE: pb-04-artistry3-actions-dave-seeram.zip: Not updating modification time as --dry-run
2020-06-24 18:32:23 DEBUG : pb-04-artistry3-actions-dave-seeram.zip: Unchanged skipping
2020-06-24 18:32:23 DEBUG : pb-02-portrait-shoot-playbook-tamara-lackey.zip: SHA-1 = 1e73fef076511b2ba277d0d25cfaf5127e912ad7 OK
2020-06-24 18:32:23 NOTICE: pb-02-portrait-shoot-playbook-tamara-lackey.zip: Not updating modification time as --dry-run
2020-06-24 18:32:23 DEBUG : pb-02-portrait-shoot-playbook-tamara-lackey.zip: Unchanged skipping
2020-06-24 18:32:23 DEBUG : pb-05-lightroom-mastery-ebook-contrastly.zip: SHA-1 = 4844a4b681a6096a79aa7f0bdce229a5662f100a OK
2020-06-24 18:32:23 NOTICE: pb-05-lightroom-mastery-ebook-contrastly.zip: Not updating modification time as --dry-run
2020-06-24 18:32:23 DEBUG : pb-05-lightroom-mastery-ebook-contrastly.zip: Unchanged skipping
2020-06-24 18:32:24 NOTICE: pb-03-travel-pro-kit-viktor-elizarov.zip: Not copying as --dry-run
2020-06-24 18:32:28 NOTICE: pb-06-art-of-black-and-white-andrew-gibson.zip: Not copying as --dry-run
2020-06-24 18:32:31 NOTICE: pb-07-creativity-on-budget-lindsay-adler-1-of-2.zip: Not copying as --dry-run
2020-06-24 18:32:32 NOTICE: pb-08-creativity-on-budget-lindsay-adler-2-of-2.zip: Not copying as --dry-run
2020-06-24 18:32:33 DEBUG : B2 bucket qnap-media-sync path Photos/5DayDeal: Waiting for transfers to finish
2020-06-24 18:32:33 NOTICE: pb-09-photo-101-zack-arias-1-of-2.zip: Not copying as --dry-run
2020-06-24 18:32:33 DEBUG : Waiting for deletions to finish
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:                 9 / 9, 100%
Transferred:            5 / 5, 100%
Elapsed time:        11.3s

So it seems to want to copy any file that's "large", something like over 100M in size.

I thought sync command relied on mod time/size only, but the log seems to say it's checking SHA1 checksums as well. What am I doing wrong here?

Rclone generally uses checksums if they are available on each remote.

You can use --size-only if you want.

But why is it using checksums for only some of the files?

Also, ideally I'd like rclone to update the mod time on the remote according to the local files so that going forward I don't have to use --size-only and such.

After some more digging, turns out that the previous backup did not keep the modification times on the files, but updated them to be the time of the backup. So on the remote end, the files have a newer time stamp than on the source.

I tested sync on 2 files, one 39M in size and one 140M. For the first one rclone simply updated the modification time on the B2 remote (setting it correctly to the older timestamp that the source file had), but for the second one it uploaded the file and also modified the time. Then I touched the large file and ran rclone again; this time it simply updated the modification time and did not re-upload the file.

Any idea why rclone isn't simply doing "update the timestamp" operation for all the files?

Can you share a log of what you are running with -vv as that should have the answer.

Here you go.

First, the "small" file.

andrei@docker-vm:/mnt/Photos/5DayDeal$ rclone -vv --fast-list --progress --b2-chunk-size=40M --b2-upload-cutoff=200M --transfers 16 --exclude .DS_Store sync pb-01-freebies-and-bundle-info.zip b2:qnap-media-sync/Photos/5DayDeal
2020/06/25 20:20:48 DEBUG : rclone: Version "v1.52.2" starting with parameters ["rclone" "-vv" "--fast-list" "--progress" "--b2-chunk-size=40M" "--b2-upload-cutoff=200M" "--transfers" "16" "--exclude" ".DS_Store" "sync" "pb-01-freebies-and-bundle-info.zip" "b2:qnap-media-sync/Photos/5DayDeal"]
2020/06/25 20:20:48 DEBUG : Using config file from "/home/andrei/.config/rclone/rclone.conf"
2020/06/25 20:20:48 DEBUG : fs cache: renaming cache item "pb-01-freebies-and-bundle-info.zip" to be canonical "/mnt/Photos/5DayDeal"
2020-06-25 20:20:48 DEBUG : pb-01-freebies-and-bundle-info.zip: Modification times differ by 7h14m55.3635648s: 2019-02-08 20:48:33.6364352 -0600 CST, 2019-02-09 10:03:29 +0000 UTC
2020-06-25 20:20:48 DEBUG : pb-01-freebies-and-bundle-info.zip: SHA-1 = 8af0d7ea0d28bf600f49fc252363d814bc04931a OK
2020-06-25 20:20:50 INFO  : pb-01-freebies-and-bundle-info.zip: Updated modification time in destination
2020-06-25 20:20:50 DEBUG : pb-01-freebies-and-bundle-info.zip: Unchanged skipping
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.0s

And the larger one.

andrei@docker-vm:/mnt/Photos/5DayDeal$ rclone -vv --fast-list --progress --b2-chunk-size=40M --b2-upload-cutoff=200M --transfers 16 --exclude .DS_Store sync pb-03-travel-pro-kit-viktor-elizarov.zip b2:qnap-media-sync/Photos/5DayDeal
2020/06/25 20:22:02 DEBUG : rclone: Version "v1.52.2" starting with parameters ["rclone" "-vv" "--fast-list" "--progress" "--b2-chunk-size=40M" "--b2-upload-cutoff=200M" "--transfers" "16" "--exclude" ".DS_Store" "sync" "pb-03-travel-pro-kit-viktor-elizarov.zip" "b2:qnap-media-sync/Photos/5DayDeal"]
2020/06/25 20:22:02 DEBUG : Using config file from "/home/andrei/.config/rclone/rclone.conf"
2020/06/25 20:22:02 DEBUG : fs cache: renaming cache item "pb-03-travel-pro-kit-viktor-elizarov.zip" to be canonical "/mnt/Photos/5DayDeal"
2020-06-25 20:22:02 DEBUG : pb-03-travel-pro-kit-viktor-elizarov.zip: Modification times differ by 7h14m50.3825645s: 2019-02-08 20:48:43.6174355 -0600 CST, 2019-02-09 10:03:34 +0000 UTC
2020-06-25 20:22:38 DEBUG : pb-03-travel-pro-kit-viktor-elizarov.zip: SHA-1 = d125385715ad983193804c53cb980102b53b359d OK
2020-06-25 20:22:38 INFO  : pb-03-travel-pro-kit-viktor-elizarov.zip: Copied (replaced existing)
Transferred:   	  111.033M / 111.033 MBytes, 100%, 3.135 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        35.4s

So it looks like the smaller file had SHA-1 checksum on the remote, but the larger one didn't.

I thought rclone wouldn't be using the checksums by default, see this in the docs:

-c, --checksum
Normally rclone will look at modification time and size of files to see if they are equal. If you set this flag then rclone will check the file hash and size to determine if files are equal.

Is there a way to force rclone to update the larger file's mod time?

I think the problem is how the files got uploaded when you used rsync. How did you upload them - to an rclone mount? With which version of rclone?

I don't think these files got checksums (checksums have to be added by the client for large files ) so rclone is refusing to just set the modtime as it isn't sure the files are identical.

However if you try the latest beta then you can do a sync with

  --refresh-times   Refresh the modtime of remote files.

This will set the modtime even if the files don't have a checksum. I suggest you try with --dry-run, try on a few files then run. You won't need that flag again once the modtimes are synced.

I think the problem is how the files got uploaded when you used rsync. How did you upload them - to an rclone mount? With which version of rclone?

Actually, I was wrong, the initial backup was seeded with QNAP's HybridBackupSync tool, so I think you're right that it didn't add the checksums for large files.

--refresh-times   Refresh the modtime of remote files.

I'll give it a shot!

Can rclone add checksums to those large files too without uploading or is that impossible?

Great!

It is theoretically possible. You'd have to download the file to checksum it or get the checksum from a local copy. Rclone can't do it right now though, sorry!

Actually, looks like --refresh-times is implemented via b2_copy_file API call and that does create a SHA1 checksum. Killing 2 birds with 1 stone!

Excellent!

I think that must be b2 adding the checksum!