Can we use the approach from this repo to be able to download photos at full resolution and metadata from rclone?
That project is driving a web browser behind the scenes
This project uses Playwright to open Google Photos in a headless browser. It then starts downloading all the images from the last image to the top. It also downloads images incrementally and sorts them in year and month folders. It also saves the progress in a
.lastdone
file. So, if you stop the download in between, you can start from where you left off.
We certainly could do that....
The API works fine for photo discovery etc, it is just the download that is the problem
So I guess rclone could download a file using the web browser - it seems you can just open the web browser at https://photos.google.com/photo/ID
- we know ID
already, so maybe this wouldn't be too hard....
I believe it would be perfect, there is a lot of demand to google photos since it basically lacks any useful way to export the data
You'd be the first "big" open source project that can actually download from google photos and billions of people use google photos, to only to later know they are locked in without a way out
I had a quick play with the headless browser thing but it looks like Google have started cracking down on it which is super annoying and refusing to allow login with the error "This browser or app may not be secure"
Here is the google bug report which was shut with no comments despite there being over 500 upvotes! When using Chrome with remote debugging enabled, you can't sign it to a Google account
In theory rclone has auth for google photos already, but it isn't clear how (or even if it is possible) to translate that for use that with the browser.
I asked ChatGPT how you would detect Chrome is running the remote debugging and this is what it said
Google's ability to detect remote debugging in Chrome is likely due to a combination of techniques that are not entirely available to regular developers. They might rely on the Chrome DevTools Protocol, performance and timing changes, or specific browser internals that are beyond the scope of the typical JavaScript environment.
For most web developers, detecting remote debugging reliably in the same way Google does is not feasible, as it involves deeper integration with the browser or privileged access to debugging tools.
So detecting that remote debugging is enabled is very hard, which means blocking that detection is going to be very hard!
I think you could use a different browser like firefox or chromium.
Or maybe if we provide our own cookies you could bypass the sign thing?
I figured out how to do this. I have to run the browser without remote control, and then get the user to log in. The cookies will get saved, then it will work with remote control.
I've made a prototype of this and it seems to work. I just need to wedge it into rclone.
It will be a separate binary you run which rclone will connect to. I could make rclone start it automatically in a future iteration.
I'll hopefully have something for you to try soon...
Have you tried just exporting the cookies from chrome with a extension or something?
Then users wouldn't have to do anything else but provide the cookie, which they can get from many different ways, if you'd prefer that approach
I didn't know you could do that with cookies - will bear that in mind!
I made the downloader work However I now realise my original assumption that photo ID in the API == photo ID in the google photos website is wrong It feels like Google really doesn't want me to do this!
Looks like google deliberately broke this link see the release notes
- If you stored IDs (such as IDs for albums, media items or enrichment items) before 5 September 2018, they will no longer work as the format has changed. You will need to obtain new IDs for use in your application.
Currently trying to see if I can translate the IDs somehow (unlikely - I expect they encrypted them as they are bigger than the real IDs).
I can probably make it work with some kind of search.
I can't speak for myself because I had issues running that repo because of my linux distro, but my friend did run it and said it was working, so their approach still works?
OK I have made this work! Turns out there is another URL you can go to which will translate the API ID into the real ID... Anyway...
Please follow the instructions here if you'd like to give it a try!
You will also need a beta of rclone
v1.69.0-beta.8308.6215d41dc.fix-gphotos-download on branch fix-gphotos-download (uploaded in 15-30 mins)
This is actually a very small change to the rclone code - all the magic is in the gphotosdl
tool.
I could add this functionality direct to rclone, but I decided not to for the moment for a few reasons
- It would add over 10 MB to the rclone binary!
- Being able to stop and start the proxy independently of rclone is useful
Anyway please have a go with this and tell me what you think. If you find a bug with gphotosdl
then run it with the -debug
flag and post the log.
The first time I ran gphotosdl I didn't login, because i thought i would login at photos.google.com, but I can't sign in at that URL and trying to run it again with --login I don't get the prompt to login again, how do I reset it ?
Running gphotosdl -login
again should be enough to fix it. You need to login to google photos and then close the browser that opened.
You can't log into google photos unless you use the -login
flag (Google security stuff).
Then you should be able to use gphotosdl
without the -login
flag.
If you need to reset the state of the browser then all the state is stored in ~/.config/gphotosdl/browser/
and you can just delete that directory.
sorry for the delay.
I just wiped that folder, tried again and now it worked. I even compared the hashes between the files downloaded from rclone to files downloaded using normal chrome browser and it's a match!
Amazing work as always! You just freed billions of users from google photos.
When this is going to hit stable ?
Also I believe nobody would care about 10 MB more in the binary if this means google photos works and is seamless.
Also any tips about speeding up downloading and uploading from google photos to other remotes like dropbox ?
After I get all my photos up and running I will experiment with exporting the folder to a headless server and getting a cron to sync my google photos to dropbox daily
After trying to transfer using only 8 simultaneous files and just 530 files i started to get a lot of
2024/09/26 04:38:35 DEBUG : pacer: low level retry 8/10 (error Get "http://localhost:8282/id/XXXXXXX": read tcp 127.0.0.1:34252->127.0.0.1:8282: i/o timeout)
2024/09/26 04:38:35 DEBUG : pacer: Rate limited, increasing sleep to 16.198895902s
And it seems it stalled... it appears to be a rate limit on google photos itself
Such number of transfers never really worked with Google, always resulting with rate limits.
Make sure you create your own client id as per docs. Not sure it applies for this solution though. And use 1 or 2 simultaneous transfers only.
Great! You and I think alike here - I compared the hashes for a year of photos on my phone vs those downloaded
At the moment the google photos download only downloads one photo at once. I could make it do more than one at once but I decided to get it stable first.
Only one file transfers at once at the moment so it will be waiting for the other 7 to download then timing out. Probably --transfers 1
is the correct setting at the moment, maybe --transfers 2
to give it a bit of parallelism in the setup phase.
I've merged the --gphotos-proxy flag to master now which means it will be in the latest beta in 15-30 minutes and released in v1.69
Let me know the results of testing
After 9 hours no file was downloaded anymore. I tried with --transfers 1 now but was still getting
net/http: timeout awaiting response headers
I had to use my own client id and key to get it back to working. 1 transfer is very slow. I remember it was going very fast with 8 transfers before being rate limited
After 53 photos I started getting:
Failed to copy: couldn't list files: Post "https://photoslibrary.googleapis.com/v1/mediaItems:search": couldn't fetch token: unauthorized_client: if you're using your own client id/secret, make sure they're properly set up following the docs
Which is weird because why it worked with 53 photos? I copied my client_id and key from my google drive remote, could that be the reason ? I don't know if i ever enabled google photos on it or something
I had to run rclone authorize again and enable the api in google cloud console! Now it's working again fine.
Let's see how many files I can transfer today. Thanks!
I just noticed a bug, when downloading videos it play sound in the background lol. Not a big issue but it makes harder to use the pc if you need to hear sounds of other stuff