Google Photos downloading into wrong by-month folder

What is the problem you are having with rclone?

I'm noticing a lot of photos/images from Google photos being downloaded into media/by-month/1970/1970-01 folder incorrectly.

What is your rclone version (output from rclone version)

$ rclone --version
rclone v1.56.0
- os/version: ubuntu 20.04 (64 bit)
- os/kernel: 5.4.0-80-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.16.5
- go/linking: static
- go/tags: none
$

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 20.04 64 bit

Which cloud storage system are you using? (eg Google Drive)

Google Photos

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync --transfers=10 --max-duration=59m --retries 1 --timeout 1m --fast-list --gphotos-include-archived --gphotos-start-year=1970 --log-level INFO --log-file=logfile.txt gphotos:media/by-month /FULL_PATH_REDACTED/media/by-month

Problem

I noticed more than ~28k images got downloaded into my media/by-month/1970/1970-01 local folder over the last few days via my rclone cron script. Looking through the logs I see lots of images/photos that clearly aren't dated to be in the 1970s based on filename alone:

2021/07/29 01:24:45 INFO  : 1970/1970-01/Screenshot_20190618-171319.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190612-091950.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190504-214251.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190618-101003.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190628-142016.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190225-121649.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190711-195259.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190628-142009.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190719-191252.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190711-201325.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190727-105449.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190725-160802.png: Copied (Rcat, new)
2021/07/29 01:24:46 INFO  : 1970/1970-01/Screenshot_20190825-150217.png: Copied (Rcat, new)
2021/07/29 01:24:47 INFO  : 1970/1970-01/Screenshot_20190825-150536.png: Copied (Rcat, new)
2021/07/29 01:24:47 INFO  : 1970/1970-01/Screenshot_20190904-203310.png: Copied (Rcat, new)
2021/07/29 01:24:47 INFO  : 1970/1970-01/Screenshot_20190825-150557.png: Copied (Rcat, new)
[TRUNCATED]
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-27_20-19-23_DSC_0403.JPG: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-08_10-06-09_DSC_6044.JPG: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-08_09-13-28_DSC_6038.JPG: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-08_11-31-46_DSC_6093.JPG: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-27_20-19-33_DSC_0405.JPG: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-04-01_18-21-04_DSC_9468-MOTION.gif: Copied (Rcat, new)
2021/07/29 01:31:17 INFO  : 1970/1970-01/sr_2010-07-08_10-08-00_DSC_6060-MOTION.gif: Copied (Rcat, new)
2021/07/29 01:31:18 INFO  : 1970/1970-01/sr_2010-09-20_20-08-17_DSC_2031.JPG: Copied (Rcat, new)
2021/07/29 01:31:18 INFO  : 1970/1970-01/sr_2010-09-10_11-21-25_DSC_0647.JPG: Copied (Rcat, new)
2021/07/29 01:31:18 INFO  : 1970/1970-01/sr_2010-09-17_16-05-59_DSC_0706.JPG: Copied (Rcat, new)
2021/07/29 01:31:18 INFO  : 1970/1970-01/sr_2010-09-17_16-07-52_DSC_0725.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-09-20_20-11-40_DSC_2050.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-09-17_16-02-23_DSC_0675.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-10-08_07-52-52_DSC_9592.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-10-08_08-02-11_DSC_9610.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-09-17_16-07-16_DSC_0720.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-09-17_16-03-34_DSC_0683.JPG: Copied (Rcat, new)
2021/07/29 01:31:19 INFO  : 1970/1970-01/sr_2010-10-08_08-16-07_DSC_9625.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_09-09-13_DSC_9654.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_08-16-37_DSC_9627.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_08-05-35_DSC_9611.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_09-33-31_DSC_9724.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_09-34-08_DSC_9732.JPG: Copied (Rcat, new)
2021/07/29 01:31:20 INFO  : 1970/1970-01/sr_2010-10-08_09-32-36_DSC_9705.JPG: Copied (Rcat, new)
2021/07/29 01:31:21 INFO  : 1970/1970-01/sr_2010-10-08_09-33-00_DSC_9714.JPG: Copied (Rcat, new)

I also manually checked via Google Photos UI and yes I don't have any photos going back to January 1970. I also haven't added anywhere near ~28k of new photos in the last few days - these are photos/images that were already part of my library that are being re-downloaded into the wrong year-month folder for some reason.

Using the filenames from above I did a quick spot check against previously downloaded images of the same name. The newly downloaded images are identical to the previously downloaded images for every random file name I checked (I didn't do an exhaustive check given there are 28k+ files and I was doing this manually):

$ find -type f -name sr_2010-10-08_09-33-00_DSC_9714.JPG -printf '%P\0' | xargs -r0 sha256sum
7b3e23875fcc97470524707be95f2bf78609c8d27f8aa08c706ead40fa5ed4d8  album/2010-10-08 - Urumqi to Seoul via Beijing - Plane_DONE/sr_2010-10-08_09-33-00_DSC_9714.JPG
7b3e23875fcc97470524707be95f2bf78609c8d27f8aa08c706ead40fa5ed4d8  media/by-month/1970/1970-01/sr_2010-10-08_09-33-00_DSC_9714.JPG
7b3e23875fcc97470524707be95f2bf78609c8d27f8aa08c706ead40fa5ed4d8  media/by-month/2010/2010-10/sr_2010-10-08_09-33-00_DSC_9714.JPG
$

Worries

Storage is thankfully not a worry as these are hash identical files and as I'm using ZFS with deduplication turned on I'm not really impacted by the duplicate downloads.

However my main worry is that rclone will start deleting files from the correct media-month folder. So far my rclone runs over the last few days have ended with Local file system at /FULL_PATH_REDACTED/media/by-month: not deleting files as there were IO errors. As a result none of the local files have been deleted yet but this may not always remain true.

For example, the file above is currently in the 2010-10 folder and I fear that when rclone has no more IO errors during its run, it will start deleting existing files in the 2010-10 folder because it thinks these have "moved" from 2010-10 to the 1970-01 folder for some reason.

I'm not sure if something within Google Photos API changed or within rclone.

I did some digging through my logs and here's some more info:

  1. rclone was auto upgraded on 21 July 2021 at 5:37am to the current version of 1.56.0. This was a Wednesday.
  2. My script is setup to run rclone sync for media/by-month only on Thursdays and Saturdays at the 2 minutes past the following hours: 1am, 4am, 7am, 4pm, 7pm and 10pm.
  3. Looking at all my logs since I started syncing Google Photos locally, no downloading into the 1970-01 folder occured before 21 July. Strong evidence this is related to the rclone upgrade.
  4. Note I'm in UTC+1 so Google Photos API limit resets 10am local time and my cron job times are all in local time. Keep this in mind when looking at below.
  5. Also Note I run rclone sync for other things (e.g. album or feature) in between the sync for media/by-month hence why resource exhaustion is seen often below.

Here is a look at every rclone sync for media/by-month since upgrading with info on what happened at each run.

Thursday 22 July

  • 1am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) after 13 minutes of running.
  • 4am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.
  • 7am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.
  • 4pm: No 1970-01 download.
    Finished after 41m6.8s with 19 errors.
    All errors were of Failed to copy: failed to open source object: (500 500 Internal Server Error)
  • 7pm: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) after 25 minutes of running.
    Also 500 Internal Server Error before that.
  • 10pm: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.

Saturday 24 July

  • 1am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.
  • 4am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.
  • 7am: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.
  • 4pm: YES for 1970-01 download.
    24585 files downloaded. Some into 1970-01 many files elsewhere which is a surprise too as I haven't added many files to Google Photos.
    rclone sync exited because it reached the 59m maximum run time.
    It finished with: Transferred: 24585 / 34586, 71%
  • 7pm: YES for 1970-01 download.
    24052 files downloaded. Some into 1970-01 many files elsewhere which is a surprise too as I haven't added many files to Google Photos.
    rclone sync exited because it reached the 59m maximum run time.
    It finished with: Transferred: 24052 / 31164, 77%
  • 10pm: No 1970-01 download.
    (429 RESOURCE_EXHAUSTED) from the start.

Thursday 29 July (today)

  • 1am: YES for 1970-01 download.
    12588 files downloaded. Almost all into 1970-01 from a quick glance at the logs.
    rclone sync exited because it reached the 59m maximum run time.
    It finished with: Transferred: 12588 / 12588, 100%. Although rclone had reached this at 34 minutes of runtime. It was hung on transferring 1 file until 59m when context deadline exceeded

Definitions for above

(429 RESOURCE_EXHAUSTED) = error reading source directory: couldn't list files: Quota exceeded for quota metric 'All requests' and limit 'All requests per day' of service 'photoslibrary.googleapis.com' for consumer 'project_number:REDACTED'. (429 RESOURCE_EXHAUSTED)

A little more digging into the log files where I had YES before:

root@host:/logs$ egrep "(Rcat, new)" photos_2021-07-24_16-02_media.by-month.txt photos_2021-07-24_19-02_media.by-photos_2021-07-29_01-02_media.by-month.txt | wc -l
30441
root@host:/logs$ egrep "(Rcat, new)" photos_2021-07-24_16-02_media.by-month.txt photos_2021-07-24_19-02_media.by-photos_2021-07-29_01-02_media.by-month.txt | egrep -v "1970-01" | wc -l
1953
root@host:/logs$ egrep "(Rcat, new)" photos_2021-07-24_16-02_media.by-month.txt photos_2021-07-24_19-02_media.by-photos_2021-07-29_01-02_media.by-month.txt | egrep "1970-01" | wc -l
28488
root@host:~/media/by-month/1970/1970-01$ ls -1 | wc -l
28489
root@host:~/media/by-month/1970/1970-01$ 

Conclusion: Most of the newly downloaded were actually into the 1970-01 folder unlike my comment in the above post.

I don't understand why the "Transferred" numbers add up to 61225 (24585+24052+12588) even though when greping for "Copied" we only get a total of ~28.4k files. :man_shrugging:

I remember in the last week or so I manually went through the list of files that were producing these failed to open source object while looking at logs. I found them in Google Photos web UI based on filename search and then deleted them.

I don't know when I did this. This was the first time I had cleaned up in months. It turned out these errors had been occuring in the rclone logs for months due to a few 2 to 5 seconds video clips that Google Photos was not able to process for some reason (I couldn't play them via the Web UI either).

The reason I mention this is because maybe rclone sync was getting stuck on the errors of these files it encountered and once I deleted them from Google Photos it was able to continue and download other files that it wasn't able to get to in the queue in previous runs. The fact that the rclone sync finished by itself after ~41 minutes instead of ending because it reached the 59m deadline makes me think this theory is probably bonkers :slightly_smiling_face:

The reason I'm thinking along this line is also because of the 1953 photos downloaded outside of 1970-01 as part of the media/by-month syncs as per above. That is a bit of a surprise - I did a quick check for some of the file names and they are all unique (first time downloads locally) from quite a few years ago (2010 to 2017) even though I did not add any new photos from that time frame in the last week to my Google Photos. I've been syncing for months so it's a bit of a mystery why these photos decide to download for the first time now of all times :sweat_smile: (specially since this above 4pm run ended successfully after 41m and only a few days later these 1953 new photos were downloaded without me adding new ones)

Although I see strong reason to think that the upgrade to 1.56 is where a problem was introduced I'm mentioning this in case there is something else at play that I'm not aware of.

I think @ncw can opine on this one as there was only 1 change I could see for Photos in 1.56.

The changes to the google photos backend were very small.

I don't think any of them would have made any difference unless you re-authed (rclone config reconnect or re-config) the backend.

I note that 1970-01-01 is the unix "epoch" where time = 0, so if times are being misread somehow, then 1970 is a plausible value for them to take.

So what I really need to see is the info for one of these files.

You can generate that by doing something like

rclone lsf -vv --gphotos-start-year=1970 --dump bodies gphotos:media/by-month/1970/1970-01 --log-file rclone.log

I really only need the JSON blob for one of these files so if you feel confident lifting that out then just paste that. If not then attach the entire log file.

I suspect that Google doesn't know how old these files really are since they are PNGs and don't have EXIF data, but the JSON blob will show for certain.

I haven't taken any action - it's been running automatically for months.

I'm out of API quota for today so I've unscheduled all my cron jobs for tomorrow and I'll run this tomorrow and report back.

I wonder if I changed --gphotos-start-year=1970 to another year if it would simply change to that year (e.g. if I change to 1960 maybe it'll start downloading these photos into the 1960-01 folder?). Depending on how much quota the above command uses, I might try that tomorrow.

@ncw here is the logfile from running this command:
rclone lsf -vv --gphotos-start-year=1970 --dump bodies pparth_photos:media/by-month/1970/1970-01 --log-file rclone-downloading-in-1970-01.log.txt &> rclone-downloading-in-1970-01.run.txt &

I have redacted what looked like sensitive info with _REDACTED_. Once you've had a chance to look at the file, please let me know and I'll un-share since I'm not entirely sure what's sensitive and what's not.

I also ran these commands but they produced no results:

  1. rclone lsf -vv --gphotos-start-year=1960 --dump bodies pparth_photos:media/by-month/1960 --log-file rclone-downloading-in-1960.log.txt &> rclone-downloading-in-1960.run.txt &
  2. rclone lsf -vv --gphotos-start-year=1969 --dump bodies pparth_photos:media/by-month/1969 --log-file rclone-downloading-in-1969.log.txt &> rclone-downloading-in-1969.run.txt &

My thoughts

I see "filters":{"dateFilter":{"dates":[{"year":1970,"month":1}]} is being passed correctly to the GPhotos API by rclone but for whatever reason GPhotos is returning photos with creation times that are not in January 1970.

So it seems this might be a bug or regression with GPhotos API? Unless the responding JSON is missing a field that rclone further relies on that's normally included?

Yes, I think you are right, this is a bug with Google Photos.

Rclone sends this

2021/07/30 10:00:37 DEBUG : HTTP REQUEST (req 0xc000862000)
2021/07/30 10:00:37 DEBUG : POST /v1/mediaItems:search HTTP/1.1
Host: photoslibrary.googleapis.com
User-Agent: rclone/v1.56.0
Content-Length: 105
Authorization: XXXX
Content-Type: application/json
Accept-Encoding: gzip

{"pageSize":100,"filters":{"dateFilter":{"dates":[{"year":1970,"month":1}]},"includeArchivedMedia":true}}

Which should limit to files in 1970-01 but gets back this

  "mediaItems": [
    {
...
      "mimeType": "image/jpeg",
      "mediaMetadata": {
        "creationTime": "2020-05-12T13:56:05Z",
        "width": "4032",
        "height": "3024",
...

Which is a media item clearly not in 1970-01...

Thanks for taking a look. This is truly unfortunate :confused:

I'll mark this as resolved since this is not an rclone issue.

As per with previous issues, really appreciate you taking the time to help me dig into what's happening :slight_smile:

1 Like

You could try reporting it as a bug in the google bug tracker

This is the only bug report with the text "1970" that I could find

https://issuetracker.google.com/issues/132616053