I have finally got to looking at this properly. It left one instance of the file
JP/Tax/NRLT/2013/Q3 to dec 2012/National grid wayleave oct 2012 115.30gbp.PDF
and I think I have manually reinstated another instance in another folder, but that is somewhat confused by subsequent changes so I have started looking at a different file.
The problem area I came across, and this might account for many of the situations for the earlier issues is an example of a file that exists in several locations/has several labels. Each location/label of the file is different. The file is, in this case, called About-24.docx. (It is a native Google Drive text document, but similar issues occur for pdf files). It started life as About.docx, but dedupe mangled the name of it earlier I think - it doesn't matter in this case.
[user@user ~]$ rclone -v dedupe edrclone: --dedupe-mode interactive
2020/03/30 14:56:29 INFO : Google drive root '': Looking for duplicates using interactive mode.
2020/03/30 14:57:48 NOTICE: Ed/Ed edu/Ed Wye/Cricket Week 1985/Commem Ball 1985/0000 - About-24.docx: Found 24 duplicates - deleting identical copies
Ed/Ed edu/Ed Wye/Cricket Week 1985/Commem Ball 1985/0000 - About-24.docx: 24 duplicates remain
1: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
2: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
.................. snip ...............................
20: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
21: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
22: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
23: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
24: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r>
So now add the file to another arbitrary folder not already containing the file (from the Google Drive web page, select, Shift Z, and pick a folder)
[user@user ~]$ rclone -v dedupe edrclone: --dedupe-mode interactive
2020/03/30 15:05:06 INFO : Google drive root '': Looking for duplicates using interactive mode.
2020/03/30 15:06:15 NOTICE: Ed/Ed edu/Ed Wye/Cricket Week 1985/Commem Ball 1985/0000 - About-24.docx: Found 25 duplicates - deleting identical copies
Ed/Ed edu/Ed Wye/Cricket Week 1985/Commem Ball 1985/0000 - About-24.docx: 25 duplicates remain
1: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
2: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
3: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
...................... snip ...............................
22: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
23: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
24: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
25: -1 bytes, 2020-01-30 20:53:41.583000000, MD5
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r>
Now it finds 24, not 25. They are not duplicates in the same folder, but the same Google Drive file in several folders. I'm guessing that if dedupe was run as -rename, the file would become 0000 - About-25.docx.
So to create a simple case, create a Google docs file and add it to another folder (the shift z thing again) and then run dedupe:-
[user@user ~]$ rclone -v dedupe edrclone: --dedupe-mode interactive
2020/03/30 15:34:07 INFO : Google drive root '': Looking for duplicates using interactive mode.
2020/03/30 15:35:29 NOTICE: Test for rclone.docx: Found 2 duplicates - deleting identical copies
Test for rclone.docx: 2 duplicates remain
1: -1 bytes, 2020-03-30 15:31:49.951000000, MD5
2: -1 bytes, 2020-03-30 15:31:49.951000000, MD5
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r>
And keep going. Choose rename for those files (maybe did all that twice) and rerun dedupe. The file seems to have been renamed but dedupe still thinks it is a duplicate so you get:-
[user@user ~]$ rclone -v dedupe edrclone: --dedupe-mode interactive
2020/03/30 15:40:09 INFO : Google drive root '': Looking for duplicates using interactive mode.
2020/03/30 15:41:19 NOTICE: Test for rclone-2-2.docx: Found 2 duplicates - deleting identical copies
Test for rclone-2-2.docx: 2 duplicates remain
1: -1 bytes, 2020-03-30 15:31:49.951000000, MD5
2: -1 bytes, 2020-03-30 15:31:49.951000000, MD5
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r>
I do appreciate having a file in more than one folder was always going to be problematic but this is legacy data which until1.51.0 rclone seemed to tolerate ok.
That may not cover all the issues, and certainly not the deleted files, but it is a start.
- Rclone dedupe is now, and I posit wrongly, treating google drive files in more than one folder as if they are duplicate filenames in the same folder - there is a crazy flawed symmetry. Rclone sync still gets this bit right, the flawed logic is only in rclone dedupe.
- The subsequent rclone dedupe rename behaviour just repeatedly, incrementally renames the same file so next time around rclone still thinks there is a duplicate.
Ed.