Possible data damage: Rclone Google Photos upload appears to be corrupting photos

Hello, I have found a possible bug with uploading photos from certain types of cameras to google photos using Rclone. Images from some cameras are regularly uploaded in a damaged format, often overwriting previous good versions on Google Photos.
I see this happening on some but not all photos from compact cameras and iPhones that I use or used in the past. There are three types of corruption that I have seen:

  1. Movement of the edge of the image vertically and horizontally, causing the image to appear cut in half.
  2. Grey bars at the top or bottom of an image.
  3. Total corruption, causing colorful rainbow type effects.

Attached I have added an example of each. EDIT: I am not allowed to attach images as a new user, so I have attached a link to Dropbox.
For all images from a camera that Rclone uploaded, about 1 in 10 shows this corruption. When opened on the local disk, the image in case appears fine in both photo editing software and file browser. When downloaded from Google Photos, the corruption is also present in the downloaded file.

The command that I am using to upload is as follows in crontab:
0 */20 * * * /usr/bin/rclone copy /mnt/cephfs/Fotomateriaal/ gphotos:album/Fotomateriaal --log-file=/opt/logs/rclone-upload-gphotos.log & >/dev/null 2>&1
I have disabled this for the time being.

My file system is CephFS. Rclone runs in a lxc on proxmox 6. CephFS is accessed as read only using a bind mount to the host CephFS kernel mount.
rclone v1.49.5

  • os/arch: linux/amd64
  • go version: go1.12.10

The camera models that have shown these types of corruption are:

  • Olympus TG-6 in jpeg mode
  • Olympus TG-6 in raw mode
  • Fujifilm Finepix F30
  • iPhone 6s

The examples can be seen here as I am not allowed to attach images:

There are a number of questions and concerns that I have:
What could be the cause of this? Why is Rclone overwriting good images on Google Photos? Have I misconfigured Rclone? How can I fix the already overwritten images on Google Photos?

Thanks in advance to anyone willing to lend their assistance.

I'm sorry to hear that :frowning:

No-one else has reported this so I think we need to pin down what is unique about your setup. The thing that jumps out at me is

So can you try some transfer tests using rclone from your CephFS to google drive say? And if you see corruption there try transfers from CephFS to local disk?

Could you also try uploading uncorrupted images from local disk (not CephFS) to google photos??

It would be helpful to have an original photo and the corrupted photo if you've got a pair of photos like that.

All good questions...

  • What could be the cause of this?

Unknown... My guess would be some interaction between rclone and CephFS but that may be completely wrong.

It might be a bug in the rclone upload code.

It could also be google photos - note that google mangle the photos so it isn't inconceivable that there is a problem there: https://rclone.org/googlephotos/#downloading-images

  • Why is Rclone overwriting good images on Google Photos?

It should say in the log if you run with -vv why rclone is transferring a file.

  • Have I misconfigured Rclone?

Unlikely.

  • How can I fix the already overwritten images on Google Photos?

Unfortunately google photos doesn't provide a checksum so we can't easily see what has been corrupted there.

Do you have copies of the originals?

Hello @ncw thanks a lot for your reply, I really appreciate the support!

I have a few other lxc with Rclone backing up other parts of our files from CephFS, in this case my business files. I have not seen similar corruption there. Also, the corruption has not appeared in any canon raw files (cr2) even though that is the bulk of my photos. This could also be coincidental of course.

The command I use in my other LXC Rclone (variations thereof):
0 */8 * * * rclone sync /mnt/cephfs/Google\ Drive/ cephfsbackup:drive --backup-dir cephfsbackup:drive-backup/date -I -v --transfers=20 --checkers=50 --min-age 30m --log-file=/opt/logs/rclone-upload-cephfs.log & >/dev/null 2>&1
So that makes daily backups of changed files as well (which is a super useful and awesome feature!).

As you suggested, I have added original files from the file system in the dropbox folder:


Hopefully that can give some insight!

As for your question: Yes, I have copies of all my originals (of course) and I have several layers of backups of all these originals using a combination of Rclone syncs and Duplicacy to backup to various clouds. Rclone is a huge part of my copying and backup strategy and I really would not be able to do much without it!

Possibly unrelated: As I was browsing for images, I came across another oddity just now. Rclone Google Photos upload seems to make many many duplicate albums on Google Photos. I have a Ingress folder which contains to-be-processed images and Rclone seems to have made dozens of albums with this name each containing one (1) different image.

I will also run a -vv version of my copy command but I am slightly worried that it will overwrite more images.

Again, thanks for the help!

A transfer to google drive would notice if the size was incorrect after upload. Google photos doesn't have sizes on objects so doesn't notice :frowning:

Are these bigger than the other files?

As far as rclone is concerned they are just blobs of data - it doesn't do any image processing. In fact the only part of the chain which does do image processing is google photos....

Looking at the F30 file compared to its original...

There are 12288 bytes missing = 12k starting at an offset of 0x3000 12k.

The fact that this is page aligned and page sized is very suspicious

Likewise

  533  diff -u <(hexdump 'iPhone6s grey band.JPG') <(hexdump 'iPhone6s grey band.JPG original IMG_1873.JPG') | less
  534  diff -u <(hexdump -v -e '/1 "%02X\n"' 'iPhone6s grey band.JPG') <(hexdump -v -e '/1 "%02X\n"' 'iPhone6s grey band.JPG original IMG_1873.JPG') | less
  535  diff -u <(hexdump -v -e '/1 "%02X\n"' 'iPhone6s grey band.JPG') <(hexdump -v -e '/1 "%02X\n"' 'iPhone6s grey band.JPG original IMG_1873.JPG') | diffstat

Which gives 61440 bytes = 60k missing from offset 0x7000 = 28k

4k is the typical page size for a Linux box so this is some interaction between the CephFS/kernel/rclone. Thought I guess this could be happening at google also...

How easy is it to replicate this problem? If I send you some code to test would you be able to say with reasonable confidence that the problem was fixed?

A -vv log should show what is happening here.

Here is another idea...

Could you run

rclone md5sum /path/to/cephfs/images | sort > md5sums.1

several times (incrementing the .1 each time) and then compare the output... I'd do md5sum md5sums.* as a first step and then if any of the files had differences run diff -u md5sum.1 md5sum.2

That will see if rclone occasionally mis-reads the data.

The way rclone is reading the data may be tickling a bug in the kernel or CephFS.

Hello @ncw thanks again for your help and suggestions.

The first thing I tried was your idea to use md5sum to see if misreads occur. I manually ran the command at various points in the past ~18 hours from 3 different points, the proxmox host (straight to kernel Ceph) the lxc where Rclone-google-photos lives and another lxc that I use for uploading parts of Ceph to google drive. I took care to point them all to the same CephFS directory.

root@Rclone-backup-cephfs:~# md5sum md5sums.*
9978b366e3526eddc0aabc0d702b93dc  md5sums.1
9978b366e3526eddc0aabc0d702b93dc  md5sums.2
9978b366e3526eddc0aabc0d702b93dc  md5sums.3
9978b366e3526eddc0aabc0d702b93dc  md5sums.4
9978b366e3526eddc0aabc0d702b93dc  md5sums.5

root@Rclone-gphotos:~# md5sum md5sums.*
9978b366e3526eddc0aabc0d702b93dc  md5sums.1
9978b366e3526eddc0aabc0d702b93dc  md5sums.2
9978b366e3526eddc0aabc0d702b93dc  md5sums.3
9978b366e3526eddc0aabc0d702b93dc  md5sums.4
9978b366e3526eddc0aabc0d702b93dc  md5sums.5

root@pve3:~# md5sum md5sums.*
9978b366e3526eddc0aabc0d702b93dc  md5sums.1
9978b366e3526eddc0aabc0d702b93dc  md5sums.2
9978b366e3526eddc0aabc0d702b93dc  md5sums.3
9978b366e3526eddc0aabc0d702b93dc  md5sums.4
9978b366e3526eddc0aabc0d702b93dc  md5sums.5

It appears that in these 15 cases, the md5 was calculated in the same way.
However, this was done on a subfolder, and I worried slightly that this might only occur on large folders. So I also ran three checksums from the same three hosts towards a much larger folder, just to make sure that it was not related to the number of files in a folder somehow.

root@pve3:~# md5sum md5sums.*
dfefdf1b30d9e022b17244c608e1694c  md5sums.1
dfefdf1b30d9e022b17244c608e1694c  md5sums.2
dfefdf1b30d9e022b17244c608e1694c  md5sums.3

root@Rclone-backup-cephfs:~# md5sum md5sums.*
dfefdf1b30d9e022b17244c608e1694c  md5sums.1
dfefdf1b30d9e022b17244c608e1694c  md5sums.2
dfefdf1b30d9e022b17244c608e1694c  md5sums.3

root@Rclone-gphotos:~# md5sum md5sums.*
dfefdf1b30d9e022b17244c608e1694c  md5sums.1
dfefdf1b30d9e022b17244c608e1694c  md5sums.2
dfefdf1b30d9e022b17244c608e1694c  md5sums.3

This does not seem to be the case however.

As for your earlier post:

yes, the CR2 are around 20-30 MB each. Most jpeg files are much smaller at between 2-8 MB each. The ORF files are around 5-12 MB.

Well, I could point Rclone at my ingress directory, which contains some 10.000 images at the moment and sort Google Photos by 'added recently'. That was how I first discovered the corrupted images. I think this should show pretty quickly if any new images are getting corrupted.

Ok, I will have it take a run on a specific folder tonight and see what kind of messages it gives me. Again, I really appreciate the help.

Thanks for doing that! rclone md5sum will read the data in the same way as rclone copy so should be a reasonable test of that data transfer path.

If you can capture a -vv log of a problem upload which corrupts files that would be really helpful.

Can you also try doing a similar upload from local disk to see if we see corruptions like that?

@ncw As you suggested, I set Rclone to upload using -vv and this shows some errors. This time I am using the path gphotos:upload and left it to run for two days. Here is an example of an image that got damaged.

2019/10/17 20:09:09 DEBUG : _7190137.ORF: >Update: err=failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED)
2019/10/17 20:09:09 DEBUG : _7190137.ORF: Received error: failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED) - low level retry 1/10
2019/10/17 20:09:09 DEBUG : Google Photos path "upload": Put: src=_7190137.ORF
2019/10/17 20:09:09 DEBUG : _7190137.ORF: Update: src=_7190137.ORF
 *                                  _7190137.ORF:154% /13.226M, 120.193k/s, -
2019/10/17 20:10:22 DEBUG : _7190137.ORF: >Update: err=<nil>
2019/10/17 20:10:22 DEBUG : _7190137.ORF: Size:
2019/10/17 20:10:22 DEBUG : _7190137.ORF: >Size:
2019/10/17 20:10:22 INFO  : _7190137.ORF: Copied (new)

I have added the image in the dropbox folder, both the version from Google Photos and the version pulled locally from the filesystem.

This also happens similarly to jpeg images:

2019/10/17 18:45:58 DEBUG : P9211403.JPG: >Update: err=failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED)
2019/10/17 18:45:58 DEBUG : P9211403.JPG: Received error: failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED) - low level retry 1/10
2019/10/17 18:45:58 DEBUG : Google Photos path "upload": Put: src=P9211403.JPG
2019/10/17 18:45:58 DEBUG : P9211403.JPG: Update: src=P9211403.JPG
 *                                  P9211403.JPG:197% /5.469M, 85.343k/s, -
2019/10/17 18:47:10 DEBUG : P9211403.JPG: >Update: err=<nil>
2019/10/17 18:47:10 DEBUG : P9211403.JPG: Size:
2019/10/17 18:47:10 DEBUG : P9211403.JPG: >Size:
2019/10/17 18:47:10 INFO  : P9211403.JPG: Copied (new)

This image contains people so I have not added this to dropbox, but I wanted to show you the log in case that helps.

In the past I have done this often when I wanted to quickly upload a small set of images, basically drag and drop to the web interface. This has, to my knowledge, never led to this kind of corruption.

How many times do you see this in your log? Only with corrupted photos? Maybe the retry on that bit of code does something wrong.

@ncw Thank you again for your reply.
Well, 429 resource_exhausted shows up quite a lot:

root@Rclone-gphotos:~# grep -o -i '429 RESOURCE_EXHAUSTED' upload-gphotos.log | wc -l
22166

But the corruption is not present for most of the uploads. At the risk of confusing causation and correlation, there appears to be some kind of link. After I manually checked a few images that showed the error I saw that of the 10 I checked 2 of them were corrupted.

And I also checked for the > 100% completed part, this also shows up a few times:

root@Rclone-gphotos:~# awk -F: '{if($2>101)print$2}' <upload-gphotos.log | wc -l
61

As soon as I get the chance I will manually cross check a few of these errors to see if that matches.

Edit: The <100% uploads have no relation, I checked another 6 and none of them were corrupted.

Is there any chance you could send me the complete log? If you don't want to make it public then email it to me nick@craig-wood.com with a link to this forum page - thanks!

I can't see a mechanism for how retries could cause corruptions unless something funny is happening at google. Though I can't work out why the insert upload item isn't being retried which I'm hoping the full logs will reveal!

Thank you, I have sent you an email.

1 Like

Ah, things are a little clearer now...

This is the first time we try to upload the corrupted file

2019/10/16 06:36:22 DEBUG : _7190137.ORF: Update: src=_7190137.ORF
2019/10/16 06:36:28 DEBUG : _7190137.ORF: >Update: err=<nil>
2019/10/16 06:36:28 DEBUG : _7190137.ORF: Size: 
2019/10/16 06:36:28 DEBUG : _7190137.ORF: >Size: 
2019/10/16 06:36:28 INFO  : _7190137.ORF: Copied (new)
2019/10/17 01:36:45 DEBUG : _7190137.ORF: Update: src=_7190137.ORF
 *                                  _7190137.ORF:100% /13.226M, 1.322M/s, 0s
 *                                  _7190137.ORF:100% /13.226M, 25.940k/s, 0s
2019/10/17 01:37:57 DEBUG : _7190137.ORF: >Update: err=<nil>
2019/10/17 01:37:57 DEBUG : _7190137.ORF: Size: 
2019/10/17 01:37:57 DEBUG : _7190137.ORF: >Size: 
2019/10/17 01:37:57 INFO  : _7190137.ORF: Copied (new)

It is successful.

However on the next try it tries to upload it again which is wrong...

2019/10/17 20:09:02 DEBUG : _7190137.ORF: Update: src=_7190137.ORF
2019/10/17 20:09:09 DEBUG : _7190137.ORF: >Update: err=failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED)
2019/10/17 20:09:09 DEBUG : _7190137.ORF: Received error: failed to create media item: Quota exceeded for quota metric 'photoslibrary.googleapis.com/write_requests' and limit 'WritesPerMinutePerUser' of service 'photoslibrary.googleapis.com' for consumer 'project_number:202264815644'. (429 RESOURCE_EXHAUSTED) - low level retry 1/10
2019/10/17 20:09:09 DEBUG : _7190137.ORF: Update: src=_7190137.ORF
 *                                  _7190137.ORF:154% /13.226M, 120.193k/s, -
2019/10/17 20:10:22 DEBUG : _7190137.ORF: >Update: err=<nil>
2019/10/17 20:10:22 DEBUG : _7190137.ORF: Size: 
2019/10/17 20:10:22 DEBUG : _7190137.ORF: >Size: 
2019/10/17 20:10:22 INFO  : _7190137.ORF: Copied (new)

I think that explains your many duplicates.

It doesn't explain the corruption though...

I just realised what is causing the duplicates...

When you upload to "gphotos:upload" it will duplicate things if you upload them again. I tried to make this clear in the docs but maybe didn't succeed!

The upload directory is for uploading files you don’t want to put into albums. This will be empty to start with and will contain the files you’ve uploaded for one rclone session only, becoming empty again when you restart rclone. The use case for this would be if you have a load of files you just want to once off dump into Google Photos. For repeated syncing, uploading to album will work better.

If you copy the same thing again and again, then you want to upload to an album which can prevent you uploading the same thing again.


Back to the corruption...

Could you try slowing down the upload rate a bit with --tpslimit 10 or maybe --tpslimit 1 to try to reduce the number of 429 errors? Then we can see if they really are correlated with the corruption?

No this is clear, I only used gphotos:upload this time for the log, my crontab script uses albums.

What I do not understand is why it did multiple copy operations, as I only ran the command for this log once (unless I am very mistaken, which is a possibility). It certainly is not your docs. The duplication was also seen in my regular crontab command which uses albums.

But this is of lesser concern, the corruption is more problematic.

Agreed

I will do that right away with an age limit (so it will not take another 2 days). The command will be /usr/bin/rclone copy /mnt/cephfs/Fotomateriaal/Ingress/ gphotos:upload -vv --log-file=upload-gphotos-again.log --max-age 2M --tpslimit 1

That sounds great - thanks!

@ncw this is taking a long time, because I had not sufficiently realized that 2 months would still be quite a few images and obviously it only transfers 1 file per second now.

However! None of the files that have been transferred so far show any corruption!
The log so far still shows the 429 errors though, 57 so far.
But that does not really matter I think, so long as images do not get damaged.

That is good! So if that is the case then I can examine the retry code more carefully to find out what is going on.

I'd like to see if I can replicate this myself but I'm not sure how you detect the image corruption.

Do you do that by eye, or is there some other way to detect?

Hello @ncw, apologies for the late reply, I was away for a few days (taking pictures :smile:)

You assumption is correct, I look at the 'recently added' search option on google photos and scroll down till I see corruption. It is not a very efficient way of doing it, but it is how I first discovered the corruption and the really badly corrupted images are very easily discovered. The partly moved x/y border images are similarly easily found.

I'll see if I can reproduce this myself...

In the mean time can you please make a new issue on github about this so we don't forget about it. Can you put a link to the forum thread.