GDrive->rclone sync multiple issues found for rclone windows and Google Drive

Hello,

Start off with a thanks for making rclone, it’s been useful for syncing some large backups from google drive.

If you prefer these issues all be added as separate bugs or feature requests please let me know and I’ll break them back down so they can be dealt with in separate entries.

(These issues have been tested with 1.34 and 1.34-04 beta)

There are a couple of issues that are working together to cause me grief while possibly not ‘bugs’ they are potentially bug-like to users. While they can be circumvented with better user file naming hygiene they still exist and may affect other rclone windows users.

[EDIT: Sorry, 1) can probably be ignored, I overlooked the feature request in github to fix this kind of thing]

  1. repeatedly overwriting files that share the exact same name (eg, three DIFFERENT files all named ‘test1’ will cause only one file to copy, but get modified an additional 2 times due to mismatch, being left with only the third ‘test1’ file on the local disk. (If there is a way for rclone to pull the fileID instead of the fileName when writing files it could store this and ‘know’ they are actually different files and automatically rename them as they’re written to disk (eg oldest fileID first or something)

  2. Files being renamed due to special characters (eg characters that are okay in gdrive / linux but not windows) are left out of sync commands. They will be detected, they will copy, then at the end of the copy the sync will double check and say “These file do not exist in the GDrive so I will delete them” even if they were copied and automatically renamed earlier on in the same sync process. (This is just because the new filename does not match the old filename when sync compares file lists and is circumvented by ‘–delete-before’ but still causes the file(s) to be re-downloaded every time)

  3. Filenames not matching their file titles (eg extra whitespace at the end of the document TITLE, but that is automatically cleaned up in the Drives file name so you won’t find the issue until you go to do a copy/sync) example: file named “test123” in the drive, but the title of the document is "test123 " - this breaks rclone’s download attempt because it seems to use the file ‘title’ when making the download request to Drive, but Drive knows it by its drive file name (without the whitespace)

While I have logs they are pretty huge, so I’d rather just provide a list of the example entries / errors given by rclone when the above issues arise but I’m pretty sure I’ve already narrowed them down far enough for the info to be useful.

Let me know if you would like anything else and I apologise if my explanations don’t make sense.
Thanks

You are welcome :slight_smile:

Discussing them here fine is first, once we’ve worked out what is going on we can make issues if necessary.

rclone dedupe can help with this too. It is a rather nasty mismatch being able to have multiple things with the same name on google drive vs every other filing system in common use. I don’t think there is anything rclone could do about it sensibly, except show the duplicates with a different file name. I could make a flag for that, but I’d rather people used rclone dedupe which can rename duplicates also.

Can you give a concrete example for this one please? Ideally a sequence of commands which demonstrates the problem.

I tried to replicate this but didn’t succeed - can you make an example for me?

I made a file with a trailing space and successfully synced it. The xxd -g1 output shows the trailing space quite clearly.

$ rclone ls drive:a_file
        6 a_file 
2016/11/16 10:17:07 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            0
Elapsed time:       800ms
$ rclone ls -q drive:a_file | xxd -g1
00000000: 20 20 20 20 20 20 20 20 36 20 61 5f 66 69 6c 65          6 a_file
00000010: 20 0a                                             .
$ rclone -q copy drive:a_file /tmp/a_file
$ ls -l /tmp/a_file/
total 4
-rw-rw-r-- 1 ncw ncw 6 Nov 16 10:16 a_file 
$ ls /tmp/a_file/ | xxd -g1
00000000: 61 5f 66 69 6c 65 20 0a                          a_file .
$ rclone -q cat "drive:a_file/a_file "
hello

Hello!

Thanks for the response, I’ll address what I can

Issue 1) I agree and want to use the dedupe function, unfortunately the data doesn’t belong to me, I’m just in charge of backing it up, but am in the process of getting them to re-assess how they store things in order to better fit with how data SHOULD be stored on a local system.

Issue 2) I can replicate the issue with any drive file with special chars that are incompatible with windows ( primarily * and : ) using the sync command (at least one additional time after the first download in case the first copy works fine)

(I tested this under default conditions, but then also ran --checkers 1 --transfers 1 just to rule out any “one checker found it but another didn’t” scenario as I’m unaware of the inner workings of rclone)

The command ‘rclone sync’ using default (–delete-during) or --delete-after causes the problem but using --delete-before circumvents the issue.

Example log: (This was run twice to show the success on first sync and failure on second - I’ve also copied the logs to gdrive files that are view only because the forum system here was deleting extraneous white/tab space when trying to input the logs which defeated the purpose plus they were spammy)

If you wish to replicate it yourself the instructions are as follows:

  1. Create a file in Google Drive using a non-compatible special character ( * or : for example)
  2. Sync the file using rclone for windows to a windows device (doesn’t appear to occur in first sync)
  3. Attempt to sync again, rclone should then say that it is replacing invalid characters again, copying again, but then deletes it at the end of the run.

Issue 3) I was mistaken, it wasn’t just plain white space, it was tab space - the likely result of a user copying the name and pasting it with the included tab character(s) (whether Google Drive autofilled the name based on the first line of the document or they manually copied it into the document name). Tested this with normal spaces and as you showed it works fine, but with tab space it errors out.

This can, again, be fixed with better user naming hygiene since I’ve got no idea why the hell tabs should be in filenames to begin with so is likely an edge case.

Example log:

I hope that covered everything, and I can’t say I expect issue 3 to be ‘fixed’ given how weird of an edge case it should be :slight_smile:

Thanks for clarifying those. I’ll investigate some more tomorrow!

I made an issue about the special characters here - subscribe to receive updates!

I tried to reproduce this under linux which failed, and makes sense when I looked at your log more closely.
What the error message seems to be saying is that it is having trouble writing the filename with a TAB in under windows. You can have TAB in a file name under linux.

2016/11/16 22:57:45 Whitespace, test, document		.docx: Failed to copy: open \\?\D:\Work\inspire\backup\Testing\Whitespace, test, document		.docx: The filename, directory name, or volume label syntax is incorrect.

What do you think rclone should do here - replace it with _ like the other characters that you can't have in a Windows file name?

If you like that idea then please make an issue about it - thanks.

Hi,

Great! And yeah, I’d say any incompatible character should be detected and replaced with an _, perhaps they could even be the same issue just with two test cases since they are both actually regarding windows compatibility of characters.

I made an issue for this too

Ah, thank you, it does make more sense as a second issue after you pointed out it’s control character related, thank you very much.