What is the problem you are having with rclone?
When using --gphotos-read-size
for Google Photos sync, I'm seeing different sizes being returned for the same photo on re-runs (intermittently - this is not reproducible on every run). Upon further inspection of a single instance of this, the only main difference seems to be an extra "_1_" appearing within the downloaded image (more details below).
@ncw this is something I found while running rclone sync
with --gphotos-read-size
as I mentioned in Add flag to exit immediately when --max-duration reached - #26 by timevortex
What is your rclone version (output from rclone version
)
rclone v1.55.0
- os/type: linux
- os/arch: amd64
- go/version: go1.16.2
- go/linking: static
- go/tags: cmount
Details
In my logs I noticed that some photos were being downloaded over and over again. On investigation I found at least one photo in my logs that had listed the size in logs because I had --log-level DEBUG
enabled for those runs.
The important lines from the logs:
timevortex@timevortex:/REDACTED$ egrep -r "PXL_20210212_103513851.PORTRAIT.jpg" *
rclone_photos_2021-04-01_13-02_media.by-month.2021.txt:2021/04/01 13:35:14 DEBUG : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Sizes differ (src 3142138 vs dst 3142222)
rclone_photos_2021-04-01_13-02_media.by-month.2021.txt:2021/04/01 13:35:17 INFO : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Copied (replaced existing)
rclone_photos_2021-04-07_09-32_media.by-month.2021.txt:2021/04/07 09:58:25 DEBUG : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Sizes differ (src 3142222 vs dst 3142138)
rclone_photos_2021-04-07_09-32_media.by-month.2021.txt:2021/04/07 09:58:28 INFO : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Copied (replaced existing)
timevortex@timevortex:/REDACTED$
At first I thought maybe Google Photos was reporting different sizes but was downloading the same size. I have hourly zfs snapshots so I checked the file size and noticed that the size downloaded actually matched what was in rclone logs:
timevortex@timevortex:/REDACTED/.zfs/snapshot$ ls -al */REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_11.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_12.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_13.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-01_14.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-01_15.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_07.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_08.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_09.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_10.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_11.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_12.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_13.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
timevortex@timevortex:/REDACTED/.zfs/snapshot$
Quick check on hashsums helped me see there are only 2 versions of the file (I expected three since the file was replaced twice). This meant that the file wasn't changing wildly:
timevortex@timevortex:/REDACTED/.zfs/snapshot$ sha256sum */REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg | cut -d\ -f1 | sort | uniq
521f55ec8402b04c4346bde8cd6466c4dd66b4e1c516eee6275d41484afeb7b0
b786334f39975fb8dbfae5d80d9d23d821d4cc3c2aad3903442356e134286409
timevortex@timevortex:/REDACTED/.zfs/snapshot$
Using vbindiff
, xxd
and hexer
I figured out that that the content of the image seemed to be identical, header seemed to be different in several places where, in one file there were extra "_1_". e.g.:
- one file had
Container="http://ns.google.com/photos/dd/1.0/container/"
and the other file hadContainer_1_="http://ns.google.com/photos/dd/1.0/container/"
- one file had
xmlns:Item="http://ns.google.com/photos/dd/1.0/item/"
and the other file hadxmlns:Item_1_="http://ns.google.com/photos/dd/1.0/item/"
. - and so on and on
If I manually delete all instances of _1_
from the header of the file using hexer
than vbindiff
shows that there is only a one byte difference (also in header), just prior to http://ns.adobe.com
: https://photos.app.goo.gl/1Nnjxa8RtVVzRNtz7
Here are the two files downloaded by rclone
if you want to look at the binary diffs yourself:
The question
Is the difference between Container
and Container_1_
decided based on the HTTP request that rclone
makes or is this entirely something that Google Photos controlled? If it's rclone
dependent then this is a bug since we don't get reliable size. Otherwise this is should be documented as yet another quirk of Google Photos API
Lucky this time
Finding this was pure luck because:
- I am no longer including the
--gphotos-read-size
flag in my runs. - I don't normally run with
--log-level DEBUG
. With--log-level INFO
the only message I see in the logs is "Copied (replaced existing)" without the size being listed. - The photo in the above instance is publicly sharable so you can look at the binary diff yourself.
If this single instance is not enough to help figure out if rclone
is responsible then I can dig through more examples from my logs & backups - although not sure if I can share the raw images.