Another special character issue (This time the lovely / and \ characters)

Hello again!

Wasn’t able to find it as an existing issue, and I just love the users in my office :slight_smile:

I have another character specific gem for you, the slash characters have a tendency to generate directory trees matching the filename instead of actually writing a file with the literal name containing the slashes (for relatively obvious reasons)

(Tested with v1.34-23 linux and windows)

Test scenario:

  1. Name two files, one “this/is/a/test” the other “this\is\another\test”
  2. Copy/sync the files with rclone

Results:

Windows) Log file shows both files successfully copy though appears to always consider the file with 's in it as a ‘new file’ with the / file being seen as an existing file though both will copy to directories (eg the directory structure .\this\is\a\test.docx and .\this\is\another\test.docx)

Linux) Will see both files and write them, the file with 's as a file named “this\is\another\test.docx” while the file containing /'s will render out to the directory structure of ./this/is/a/test.docx

Overall the default directory marker for each OS ( eg \ for windows and / for linux ) will result in a directory structure equivalent to the name of the file while both OS’s handle the other character differently)

Suggested fix is probably to smack users with a ruler to get them to stop using special characters------ I mean possibly another renaming handler such as the _'s being used for other files, otherwise if a copy/sync was then done from the local drive back to the Google drive it would cause a bunch of mismatches and directories to be created with potentially nonsensical filenames within them.

Using ‘Download’ from the Google drive automatically replaces all slashes with underscores in the file name, so would make sense to use that.

Any issues with my explanation let me know and I’ll try again.

Thanks again!

I think I’ve got that.

Here are some ideas…

First

  • make google drive substitute / chars for _ - this can’t appear in a filename on linux, mac or windows
  • make sure the local handler substitutes \ in remote file names to _
  • doesn’t fix the uploading it back problem though

Second

  • make google drive substitute / and \ characters to _ in filenames
  • doesn’t fix the uploading it back problem though

Third

  • make google drive substitute / and \ characters to unicode characters which look similar, eg ‘\’ which is FULLWIDTH REVERSE SOLIDUS -this would have the advantage of making the sync up and down work correctly. I do this already in onedrive (but the other way round to allow you to store characters with * in which are perfectly legal on linux but not on windows or onedrive).
  • Could also combine this with the first idea and only substitute / then do \ to _ translation for windows only.

In fact now that I think about it the whole replace with _ scheme that rclone uses for funny characters in windows is doomed to fail on re-upload. I guess I’ve been kind of assuming that if people want to upload back again then they will have to rename their files.

Thoughts?

Hehe - tempting :wink:

I always knew that the renaming things when re-uploading them would be a problem, but figured that users who were sticking with rclone would eventually start renaming things appropriately in their drives when re-uploads started changing all the filenames (which face it, is an automated process if someone sync’s everything down then syncs it back up). Google is really the side thats ‘in the wrong’ here (which isn’t ‘wrong’ so much as not backwards compatible with daily storage) regarding naming limitations (or lack thereof) purely because they CAN allow every character since their storage systems are likely all object based and thus stored completely differently to a normal file system. (They also have systems on their end in place already to rename files on the fly as you pull out the objects when you use ‘download my drive’ functions)

As for the options, phwoar I like the third ‘visually similar unicode characters’ option since the likely replacement characters should all be acceptable.

It is really starting to seem like Google’s going to have to start helping out with compatibility stuff like this properly too since unless their Google Drive program houses a local database for the objects (ex. for Windows, since it doesn’t exist to its full extent beyond a git like push/pull on linux/etc) unless they want to have their Drive ‘them only’ so other systems can’t interact with it properly (Which I doubt, because why give us an API and let us do what we want with it, they don’t seem like those kind of people historically).

If they’re still constantly working on their Drive API stuff they could have the Drive API sanitise filenames with incompatible characters whenever a file is ‘pulled’ through the API, which is how rclone grabs the files anyway isn’t it? Or is that literally what you meant when you were talking about 'make google drive '? Getting them involved to see if they can do stuff their end as well because overall the same problems would persist when people download drive files, edit them locally then reupload them, they 1) end up with a duplicate and don’t get the ‘versions’ they should, and 2) have to rename the files anyway. With a large enough quantity of files that becomes an issue for anyone using the service.

Great to see this is being discussed, and thanks a lot for making rclone! I signed up here because of this bug.

I think implementing any of the proposed solutions would be a significant improvement. Currently, uploading files back seems to cause more of a mess than the solutions you proposed; replacing slashes by some other character would not be nearly as bad (even though it doesn’t solve all of the issues). I’d prefer the first or second solution as the third one seems somewhat unintuitive and the first/second match Google’s own desktop sync behavior.

(Personally, I only use rclone for 1-way backups, so I don’t care about the reuplading issues.)

I’m having this problem in the opposite direction.
I’m trying to sync/copy from Google Drive to Local (Windows and Linux), and the source directory has a slash on its name (drive:“FEDERAL FULL 2016/2017”).

Nothing I do seems to work, and I’m currently unable to download.

Hey,

This is an issue with syncing from drive to local, the only thing I can suggest is changing the name of the source folder and drive: entry.

Have you got a log of the errors? Try running I verbose mode and output to a log file so we can see the error.

Thanks

Thank you for the reply.
Unfortunately renaming the folder is not an option.

The error log:

2017/02/03 11:34:50 Attempt 1/3 failed with 0 errors and: error listing source: Google drive root ‘FEDERAL FULL 2016/2017’: directory not found

I've found an issue for this and I've scheduled it for the 1.37 release!

Please subscribe to that issue for betas and updates.

https://github.com/ncw/rclone/issues/62

1 Like

Is there any chance of an option to treat one end as the source of truth and do escaping and unescaping when syncing files with the other end?

e.g. On Google Drive: “foo/bar”
…syncs locally to: “foo%2fbar”

My worry is that renaming invalid symbols to Unicode characters loses information — what if I have files using those characters? (And I do.) I won’t be able to restore the backup and end up with an exact copy of what I had, because rclone can’t tell which Solidus code points need to be turned back into slashes and which ones don’t.

You are correct. I haven’t come up with a satisfactory scheme yet, but I’m thinking about it. I’ve put your comments in my notes file!

However / is a particularly inimical problem as rclone file names can’t have / in them. They can have absolutely anything else though!