Checking existing files in google photos

So the google photos API doesn't allow you to check existence by size (let alone checksum), as stated in the documentation:

"The Google Photos API does not return the size of media. This means that when syncing to Google Photos, rclone can only do a file existence check."

Even the file existence check is not 100% accurate, since maybe the file was previously uploaded with another name, and you would upload the whole file again, only to have it de-duplicated by Google Photos (since the service only stores files with the same binary data once).

Since google photos de-duplicates files, all you would really need is for a way to check if a file with the same binary data already exists in Google Photos. If it does, you don't need to upload it.

This seems to be pretty imposible with the API, but it's exactly what the Google Photos web app does. When you upload a new file via the web interface, even if it's a huge 1GB video, it only uploads it if the video doesn't exist. If the video already exists, it takes very little time to realize this, and moves on to the next file you're trying to upload.

I don't understand completely how it does this, but it seems to make a request with a hash of some sort of the data of the file, and gets a response indicating whether the file exists or not. Here are some examples (where sde3Dddead+dRTOD343Dsfas3d would be the hash of the data, I haven't figured out how the web app's js computes that):

Request:

Request Method: POST
Status Code: 200 (from ServiceWorker)
Referrer Policy: origin

Form Data:

f.req: [[["swbisb","[[\"sde3Dddead+dRTOD343Dsfas3d=\"],null,3]",null,"generic"]]]

Response when file exists:

[["wrb.fr","swbisb","[[[\"sde3Dddead+dRTOD343Dsfas3d\\u003d\",[\"XXXXXXXXXXXX\",[\"<Link to photo>\",1197,1596,null,null,null,null,null,[1197,1596,3]\n,[99999999]\n]\n,99999999,\"sde3Dddead+dRTOD343Dsfas3d\",-14400000,999999999999,null,null,2,null,null,null,null,null,999999]\n]\n]\n]\n",null,null,null,"generic"]

Response when file doesn't exist:

[["wrb.fr","swbisb","[]\n",null,null,null,"generic"]

As you can see, when the file exists you get the url of the file in google photos (I had to replace it with <Link to photo> since this forum doesn't allow me to post links).

I was thinking this could be a pretty interesting to investigate for sync purposes. It would really speed things up when syncing folders with google photos: all you need to do is make a request for each of your files, check existence, and then upload those that don't exist. I have tested this in the web app with thousands of existing files and it runs pretty quickly.

Still, there are quite a few problems I would need to investigate:

  • How to get hash from the file
  • Doing the request using the api token rclone has (instead of the web session I'm using for the requests showed earlier).

I'm willing to keep investigating this, but I would love some feedback whether this looks promising or not (so I don't lose time if it doesn't). For example:

  1. Do you see some roadblock that would make this not work as I'm imagining it?
  2. Do you know of anyone who has tried this before and failed?
  3. Is this a feature that would be interesting for rclone?

Thanks!

That is an interesting idea!

It is annoying that the google photos web app doesn't use the Google Photos public API. (Same with the Google Drive web app). It seems Google aren't fond of their own dog food!

I think the major stumbling block is likely to be this

The google web app is likely to be using an API that isn't publicly available.

Those request above look nothing like the normal google photos API. are they protocol buffers maybe?

Blobs of data in google storage have MD5 sums so I'd try the MD5 sum to start with.

sde3Dddead+dRTOD343Dsfas3d= looks like it could be a base64 encoded md5sum - I'm guessing you've redacted it as it doesn't decode.

You can make these with rclone rclone hashsum MD5 --base64 /path/to/file

See above!

No, but I haven't strayed from the official API

If we can integrate it neatly, then yes!

Hey Nick, thanks for the answer!

I'll check the MD5 you posted to see if they match (yeah, the code I put was redacted, I'm a bit paranoid :grinning:).

I'll also investigate further to see if the requests can be made from the api token.

Thanks again!

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.