Google Photos `--gphotos-read-size` returns different sizes

What is the problem you are having with rclone?

When using --gphotos-read-size for Google Photos sync, I'm seeing different sizes being returned for the same photo on re-runs (intermittently - this is not reproducible on every run). Upon further inspection of a single instance of this, the only main difference seems to be an extra "_1_" appearing within the downloaded image (more details below).

@ncw this is something I found while running rclone sync with --gphotos-read-size as I mentioned in Add flag to exit immediately when --max-duration reached - #26 by timevortex

What is your rclone version (output from rclone version)

rclone v1.55.0
- os/type: linux
- os/arch: amd64
- go/version: go1.16.2
- go/linking: static
- go/tags: cmount

Details

In my logs I noticed that some photos were being downloaded over and over again. On investigation I found at least one photo in my logs that had listed the size in logs because I had --log-level DEBUG enabled for those runs.

The important lines from the logs:

timevortex@timevortex:/REDACTED$ egrep -r "PXL_20210212_103513851.PORTRAIT.jpg" *
rclone_photos_2021-04-01_13-02_media.by-month.2021.txt:2021/04/01 13:35:14 DEBUG : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Sizes differ (src 3142138 vs dst 3142222)
rclone_photos_2021-04-01_13-02_media.by-month.2021.txt:2021/04/01 13:35:17 INFO  : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Copied (replaced existing)
rclone_photos_2021-04-07_09-32_media.by-month.2021.txt:2021/04/07 09:58:25 DEBUG : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Sizes differ (src 3142222 vs dst 3142138)
rclone_photos_2021-04-07_09-32_media.by-month.2021.txt:2021/04/07 09:58:28 INFO  : 2021-02/PXL_20210212_103513851.PORTRAIT.jpg: Copied (replaced existing)
timevortex@timevortex:/REDACTED$

At first I thought maybe Google Photos was reporting different sizes but was downloading the same size. I have hourly zfs snapshots so I checked the file size and noticed that the size downloaded actually matched what was in rclone logs:

timevortex@timevortex:/REDACTED/.zfs/snapshot$ ls -al */REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_11.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_12.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-01_13.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-01_14.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-01_15.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_07.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_08.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142138 Feb 12 11:35 4AutoBackup_2021-04-07_09.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_10.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_11.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_12.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
-rw-rw-r-- 1 rclone rclone 3142222 Feb 12 11:35 4AutoBackup_2021-04-07_13.03.01/REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg
'[.. cut intentionally as size & date does not change for snapshots in between ..]
timevortex@timevortex:/REDACTED/.zfs/snapshot$ 

Quick check on hashsums helped me see there are only 2 versions of the file (I expected three since the file was replaced twice). This meant that the file wasn't changing wildly:

timevortex@timevortex:/REDACTED/.zfs/snapshot$ sha256sum */REDACTED/media/by-month/2021/2021-02/PXL_20210212_103513851.PORTRAIT.jpg | cut -d\  -f1 | sort | uniq
521f55ec8402b04c4346bde8cd6466c4dd66b4e1c516eee6275d41484afeb7b0
b786334f39975fb8dbfae5d80d9d23d821d4cc3c2aad3903442356e134286409
timevortex@timevortex:/REDACTED/.zfs/snapshot$ 

Using vbindiff, xxd and hexer I figured out that that the content of the image seemed to be identical, header seemed to be different in several places where, in one file there were extra "_1_". e.g.:

  1. one file had Container="http://ns.google.com/photos/dd/1.0/container/" and the other file had Container_1_="http://ns.google.com/photos/dd/1.0/container/"
  2. one file had xmlns:Item="http://ns.google.com/photos/dd/1.0/item/" and the other file had xmlns:Item_1_="http://ns.google.com/photos/dd/1.0/item/".
  3. and so on and on

If I manually delete all instances of _1_ from the header of the file using hexer than vbindiff shows that there is only a one byte difference (also in header), just prior to http://ns.adobe.com: https://photos.app.goo.gl/1Nnjxa8RtVVzRNtz7

Here are the two files downloaded by rclone if you want to look at the binary diffs yourself:

The question

Is the difference between Container and Container_1_ decided based on the HTTP request that rclone makes or is this entirely something that Google Photos controlled? If it's rclone dependent then this is a bug since we don't get reliable size. Otherwise this is should be documented as yet another quirk of Google Photos API :confused:

Lucky this time

Finding this was pure luck because:

  1. I am no longer including the --gphotos-read-size flag in my runs.
  2. I don't normally run with --log-level DEBUG. With --log-level INFO the only message I see in the logs is "Copied (replaced existing)" without the size being listed.
  3. The photo in the above instance is publicly sharable so you can look at the binary diff yourself.

If this single instance is not enough to help figure out if rclone is responsible then I can dig through more examples from my logs & backups - although not sure if I can share the raw images.

Unfortunately this isn't under rclone's control.

To read the size of a file Rclone does a HEAD request with the same parameters that it would do to download the file with a GET request.

So I think your photos aren't just stored as file on a disk, they are database entries or being modified on the fly and Google has two different versions of software for modifying them. You'll probably find that this problem will go away once the new version has rolled out fully. (Or maybe it is an A/B test - who knows!).

I'll just note that Google Docs have the same problem - sometimes the HEAD requests indicate the docs are a different size. We had to give up HEAD requests on google docs because they were too unreliable. This was a while ago before rclone has the concept of an unknown length file.

I think that is what we will have to do :frowning:

Do you want to send a pull request? Easiest way is to click the pen icon at top right here: rclone/googlephotos.md at master · rclone/rclone · GitHub

Note that if you want to change the docs for the read_size option then they need to be changed here: rclone/googlephotos.go at f52ae75a51a04188d728d8888112fb1fedb6c519 · rclone/rclone · GitHub

Happy to do this - sorry for the slow response - things are a bit hectic at the moment - I'll take a look when I get a free moment next :slight_smile:

1 Like