Onedrive: File names with '#' converted to non-ascii equivalent

What is the problem you are having with rclone?

Syncing a file or directory name with a pound sign (#) changes the name, replacing the pound sign with a non-ASCII version that looks similar (# instead of #). This may or may not look the same depending on font. But it definitely looks different to the OS.

What is your rclone version (output from rclone version)

1.53 - just installed the newest version. Same problem with 1.50

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux Mint 64 bit, latest version with all updates.

Which cloud storage system are you using? (eg Google Drive)

Microsoft OneDrive.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync ~/OneDrive/xxx  onedrive:xxx

The rclone config contents with secrets removed.

[onedrive]
type = onedrive
token = {"access_token"...}
drive_id = xxxxxxxxxxxxxx
drive_type = personal

A log from the command with the -vv flag

 rclone sync  ~/OneDrive/xxx/ onedrive:xxx -vv
2020/10/19 20:51:09 DEBUG : rclone: Version "v1.53.1" starting with parameters ["rclone" "sync" "/home/marty/OneDrive/xxx/" "onedrive:xxx" "-vv"]
2020/10/19 20:51:09 DEBUG : Creating backend with remote "/home/marty/OneDrive/xxx/"
2020/10/19 20:51:09 DEBUG : Using config file from "/home/marty/.config/rclone/rclone.conf"
2020/10/19 20:51:09 DEBUG : Creating backend with remote "onedrive:xxx"
2020/10/19 20:51:10 DEBUG : test #1.txt: Sizes differ (src 42 vs dst 17)
2020/10/19 20:51:10 DEBUG : One drive root 'xxx': Waiting for checks to finish
2020/10/19 20:51:10 DEBUG : test#2.txt: Sizes differ (src 35 vs dst 20)
2020/10/19 20:51:10 DEBUG : One drive root 'xxx': Waiting for transfers to finish
2020/10/19 20:51:10 DEBUG : test #1.txt: Starting multipart upload
2020/10/19 20:51:10 DEBUG : test#2.txt: Starting multipart upload
2020/10/19 20:51:10 DEBUG : test#2.txt: Uploading segment 0/35 size 35
2020/10/19 20:51:10 DEBUG : test #1.txt: Uploading segment 0/42 size 42
2020/10/19 20:51:16 DEBUG : test #1.txt: SHA-1 = 01bbc361c1556e2d525c9b38abebcd9e307f38d2 OK
2020/10/19 20:51:16 INFO  : test #1.txt: Copied (replaced existing)
2020/10/19 20:51:16 DEBUG : test#2.txt: SHA-1 = d7a1214e6daa707848a471e11c9ae78588029ad7 OK
2020/10/19 20:51:16 INFO  : test#2.txt: Copied (replaced existing)
2020/10/19 20:51:16 DEBUG : Waiting for deletions to finish
2020/10/19 20:51:16 INFO  : 
Transferred:   	        77 / 77 Bytes, 100%, 12 Bytes/s, ETA 0s
Checks:                 2 / 2, 100%
Transferred:            2 / 2, 100%
Elapsed time:         7.2s

2020/10/19 20:51:16 DEBUG : 10 go routines active

On my local drive, I had:
total 16
drwxrwxr-x 2 x x 4096 Oct 19 20:31 ./
drwxrwxr-x 4 x x 4096 Oct 19 20:28 ../
-rw-rw-r-- 1 x x 42 Oct 19 20:31 'test #1.txt'
-rw-rw-r-- 1 x x 35 Oct 19 20:31 test#2.txt

Remote drive (Windows 10) ended up with:
10/19/2020 08:49 PM .
10/19/2020 08:49 PM ..
10/19/2020 08:21 PM 17 test #1.txt
10/19/2020 08:31 PM (42) test #1.txt
10/19/2020 08:21 PM 20 test#2.txt
10/19/2020 08:31 PM (35) test#2.txt

In this case, it simply would end up with the 2 original files, unchanged, and two new files. But if the original files didn't exist, and were being created, it would end up with two misnamed files that look OK, but are not the same name.

This is to do with file name encoding: https://rclone.org/onedrive/#restricted-filename-characters

As far as I am aware you can't have a '#' sign in onedrive file names. It is possible this restriction has been lifted.

If it has then you can change the encoding using the --onedrive-encoding flag

Thanks a lot for the link. I'm embarrassed that I didn't read enough to see that. Unfortunately for me, I followed the guide to set it up, but just skimmed the options but didn't notice anything on first glance, so I never needed to RTFM - at least, not the pertinent section.

Obviously, that particular restriction is no longer in play on onedrive, since it doesn't have any problems with it. I'll have to check some of the other characters to make sure I don't have problems with other filenames.

I'll mark it as solved, since it seems that it is. I think I understand now how to specify it both in the command and config file, but if I have problems, I'll ask elsewhere.

Thanks again for the quick response.

I have another question about how to do this. Hopefully, this might still get an answer here.

Will I need to specify every encoding option except the hash? It seems that way, although I really wish there was a way to just add a "not hash" to the defaults.

Would I need to specify all the other onedrive options, plus all the other default options?

The directions about using the advanced config options wasn't much help; it just asked for an input, with nothing saying anything about the format.

you should be able to add that to the config file by adding this line to the config file.
encoding =

Thanks, I got that far. My question is what goes after the equals sign; do I have to enter every possible decoding code except for the one I don't want? I don't even know all the defaults.

i have no idea, never used encoding.
you can read about it here and a few simple tests should provide the answer.
https://rclone.org/overview/#encoding

Thanks, I did read about it there, which is where they talked about adding to the default list, bringing up my question. However, you actually helped more than you might think; in looking again, I gave another thought to encoding = None, and realize that it might just work OK for me. I'm only using OneDrive for my Calibre ebook database, and I think Calibre, which names the files according to author and title, and which is a cross-platform application, already avoids any special characters, it seems. I did notice it substituted an underscore for a colon in one title. So I think I'll try none, and see how that works.

glad to help a fellow skeptic+calibre user.

Ha, ha, these days you really need to be a skeptic. There's so much BS floating around. Unfortunately, I don't think it's going away, so we need to be skeptical of everything.

My father's advice from long before the internet: "Don't believe anything your hear, and only half of what you see".

That is good to know...

I found the docs again

The set rclone uses is

image

Which includes | and # where it probably shouldn't for onedrive home.

However if I actually use rclone's test program to see what characters you can store I get this

This is a little difficult to interpret but it tries each character at the Left, Middle and Right of a file and checks it can Write it, Get it and List it.

Onedrive home and business test remarkably similar with onedrive business not allowing a ' on the end of a string but both allowing # and % which rclone currently escapes.

So probably the correct thing (ignoring backwards compatibility) would be to remove # and % from the onedrive encoding. This will then match the Windows encoding which is probably sensible.

Removing characters from the encoding has the potential for rclone to have to re-upload files which can confuse users...

Yes you will.

You can use None but if you try to upload a file with a forbidden character then rclone will error at that point.

Wow, that's great information, and I really appreciate that you went to so much trouble finding it.

It seems like most of the forbidden characters are either not legal on linux, or in the case of the backslash, rarely used. I personally think I'm safe with using "none" as a filter for my current use, and probably for any personal use. But I probably will try the full filter just in case things change in the future and i forget about it.

Using "none", I was able to sync my 5 GB database to onedrive with no errors, so I will definitely not be buying Insync, which I evaluated. I don't need or really like having a sync program constantly running, and I won't be using rclone's mount feature due to slowness. Insync was pretty fast with their sync function on oneDrive, but when I tried Google drive, it used way too much CPU, and pegged two of my 6 cores.

wow, insync is very expensive for use on a server, seems to be licensed per cloud account and slow email based support.

have you considered https://rclone.org/donate/
i do not profit from this, ncw does.

Well, Insync for me would have been a $29 one time cost. Not prohibitive, but not worth paying for if I don't need it, which I don't. I actually answered their email asking me what I thought telling them thanks for the tryout but rclone seems to be what I need,

I have actually already considered a donation; I usually do donate to software I use, often multiple times - I've donated to Calibre at least 3 times because I use it a lot, and it's one of the best programs I use in many ways. I like to encourage and support the programs I use regularly, partly for selfish reasons - I want them to stick around so I can continue to use it.

I plan to make a one-time donation, and then if everything works well, I'll consider either periodic donations, or a smaller monthly one. I have no idea what the sponsorship is.

me too, i donated to Calibre, hard to imagine it not being there.

I wrote this up in an issue here: https://github.com/rclone/rclone/issues/4700 about changing the encoding to remove # and |

1 Like

That's great. But one thing that is confusing, and I think is probably a mistake in the title:
The title says
" onedrive: remove '|' and '#' ...".
Shouldn't it say
" onedrive: remove '%' and '#' ..."?

1 Like

Fixed. Thanks for pointing it out.

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.