Conflicts caused by accents in filenames with Amazon Cloud Drive

When I rclone copy from Amazon Cloud Drive I get a conflict if the filename or path has an accent in it.

For example:

root@Tower:/mnt/user/Music/FLAC/Michael Bublé/Crazy Love_ Hollywood Edition# ls -l
total 797640
-rw-rw-rw- 1 root   root  27699902 Nov 25 06:38 1-01\ Michael\ Bublé\ -\ Cry\ Me\ a\ River.flac
-rw-r--r-- 1 coliny games 20329206 Nov 25 06:38 1-02\ Michael\ Bublé\ -\ All\ of\ Me.flac
-rw-rw-rw- 1 root   root  20329206 Nov 25 06:38 1-02\ Michael\ Bublé\ -\ All\ of\ Me.flac
-rw-rw-rw- 1 root   root  19417258 Nov 25 06:39 1-03\ Michael\ Bublé\ -\ Georgia\ on\ My\ Mind.flac
-rw-rw-rw- 1 root   root  24182475 Nov 25 06:39 1-04\ Michael\ Bublé\ -\ Crazy\ Love.flac
-rw-r--r-- 1 coliny games 31061482 Nov 25 06:40 1-05\ Michael\ Bublé\ -\ Haven't\ Met\ You\ Yet.flac
-rw-rw-rw- 1 root   root  31061482 Nov 25 06:40 1-05\ Michael\ Bublé\ -\ Haven't\ Met\ You\ Yet.flac
-rw-rw-rw- 1 root   root  18312133 Nov 25 06:40 1-06\ Michael\ Bublé\ -\ All\ I\ Do\ Is\ Dream\ of\ You.flac
-rw-rw-rw- 1 root   root  26683335 Nov 25 06:41 1-07\ Michael\ Bublé\ -\ Hold\ On.flac
-rw-r--r-- 1 coliny games 29192746 Nov 25 06:42 1-08\ Michael\ Bublé\ -\ Heartache\ Tonight.flac
-rw-rw-rw- 1 root   root  29192746 Nov 25 06:42 1-08\ Michael\ Bublé\ -\ Heartache\ Tonight.flac
-rw-rw-rw- 1 root   root  21026244 Nov 25 06:42 1-09\ Michael\ Bublé\ -\ You're\ Nobody\ Till\ Somebody\ Loves\ You.flac
-rw-r--r-- 1 coliny games 22251285 Nov 25 06:43 1-10\ Michael\ Bublé\ with\ Sharon\ Jones\ and\ the\ Dap-Kings\ -\ Baby\ (You've\ Got\ What\ It\ Takes).flac
-rw-rw-rw- 1 root   root  22251285 Nov 25 06:43 1-10\ Michael\ Bublé\ with\ Sharon\ Jones\ and\ the\ Dap-Kings\ -\ Baby\ (You've\ Got\ What\ It\ Takes).flac
-rw-r--r-- 1 coliny games 29716676 Nov 25 06:43 1-11\ Michael\ Bublé\ -\ At\ This\ Moment.flac
-rw-rw-rw- 1 root   root  29716676 Nov 25 06:43 1-11\ Michael\ Bublé\ -\ At\ This\ Moment.flac
-rw-rw-rw- 1 root   root  22269568 Nov 25 06:44 1-12\ Michael\ Bublé\ with\ Naturally\ 7\ -\ Stardust.flac
-rw-r--r-- 1 coliny games 32526928 Nov 25 06:45 1-13\ Michael\ Bublé\ with\ Ron\ Sexsmith\ -\ Whatever\ It\ Takes.flac
-rw-rw-rw- 1 root   root  32526928 Nov 25 06:45 1-13\ Michael\ Bublé\ with\ Ron\ Sexsmith\ -\ Whatever\ It\ Takes.flac
-rw-r--r-- 1 coliny games 21571104 Nov 25 06:45 1-14\ Michael\ Bublé\ -\ Some\ Kind\ of\ Wonderful.flac
-rw-rw-rw- 1 root   root  21571104 Nov 25 06:45 1-14\ Michael\ Bublé\ -\ Some\ Kind\ of\ Wonderful.flac
-rw-rw-rw- 1 root   root  34014782 Nov 25 06:46 2-01\ Michael\ Bublé\ -\ Hollywood.flac
-rw-rw-rw- 1 root   root  31178542 Nov 25 06:47 2-02\ Michael\ Bublé\ -\ At\ This\ Moment\ (live).flac
-rw-r--r-- 1 coliny games 40712447 Nov 25 06:47 2-03\ Michael\ Bublé\ -\ Haven't\ Met\ You\ Yet\ (live).flac
-rw-rw-rw- 1 root   root  40712447 Nov 25 06:47 2-03\ Michael\ Bublé\ -\ Haven't\ Met\ You\ Yet\ (live).flac
-rw-r--r-- 1 coliny games 21772080 Nov 25 06:48 2-04\ Michael\ Bublé\ -\ End\ of\ May.flac
-rw-rw-rw- 1 root   root  21772080 Nov 25 06:48 2-04\ Michael\ Bublé\ -\ End\ of\ May.flac
-rw-rw-rw- 1 root   root  25937469 Nov 25 06:49 2-05\ Michael\ Bublé\ -\ Me\ and\ Mrs.\ Jones\ (live).flac
-rw-rw-rw- 1 root   root  13976059 Nov 25 06:49 2-06\ Michael\ Bublé\ -\ Twist\ &\ Shout\ (live).flac
-rw-rw-rw- 1 root   root  28878004 Nov 25 06:50 2-07\ Michael\ Bublé\ -\ Heartache\ Tonight\ (live).flac
-rw-rw-rw- 1 root   root  24865880 Nov 25 06:50 2-08\ Michael\ Bublé\ -\ Best\ of\ Me.flac
root@Tower:/mnt/user/Music/FLAC/Michael Bublé/Crazy Love_ Hollywood Edition# 

You can see that there are multiple copies of some of the above files which are identical.

I consider this a bug, but I thought I would check here before raising it on GitHub as an issue - searching there didn’t find anything.

Is this known?

EDIT: I can confirm there is only one copy on Amazon.

Whad do you mean by a conflict?

I expect you are using OSX and you uploaded those files with the amazon uploader (or an old version of rclone).

The amazon uploader (and old versions of rclone) doesn't normalise the UTF-8 characters from OSX which makes rclone think they are different.

You can't see it in what you've pasted because something has normalized the listings (maybe the forum software). However the two file names must be different as you can't have duplicated file names.

This answer on stack overflow explains what is going on:

It looks like you downloaded the files to a linux box which isn't de-normalising the file names.

So the question is, how did you upload those files? Which version of rclone are you using now, and what exactly do you mean by a conflict - if that is an rclone error message then please post it.

Thanks for responding @ncw - that is some awesome guessing skills you have there :-).

It was uploaded on Windows (using goodsync) and downloaded using a plugin on unRaid (Slackware Linux), the version of rclone is rclone v1.34.

I don’t have the exact message rclone gave, but it mentioned that there was a conflict and downloaded a second copy of the file. If you like I could delete the local copy and re-download them from Amazon capturing the exact message?

The paste was a simple ls -al from a tmux session on the unRaid box - here is an image of the listing:

As a further experiment, I deleted the directory from unRaid and re-downloaded it from Amazon Cloud Drive which completed successfully.

However, if I do the rclone copy again it wants to re-download all of the filenames with an accent.

I wonder whether something unRaid specific (e.g. its disk pooling library) is getting in the way here?

Interesting…

I have difficulty understanding how there are two files with the same name! There could be 2 reasons I could think of

  • different unicode encodings - your cut and paste should have shown that up though
  • something to do wtih unRaid

Could yu try copying to a normal (non unRaid) directory and see if that works properly?

Also could you do something like this in the directory you did the listing in

ls -l 1-02* | xxd -g1

That should list all the files starting with 1-02 and hex dump the listing - that will enable me to double check the utf-8 encoding.

Hi @ncw, since my last post I deleted my local copy so I don’t actually have two duplicate files anymore. I only have the fresh copy from ACD.

However, I am still seeing a subsequent copy claim that files with an accent are duplicated on unRAID. If I subsequently download them again then it overwrites the file.

I tried this on my Mac and yep, everything is hunky dory there.

I think this might be something really specific to unRAID - let me experiment more and see if I can get a small reproducible case.

One thought - I am using an unRAID plugin which stores a hash of the file in the extended attributes of the file. If it were that I would expect all files to be considered changed though.

Weird, will keep find attractive straws and giving them a pull…

Actually, I have found some more duplicates but xxd isn’t installed(!) on unRAID and simply extracting it from the Slackware package gives an error.

All of this music is legally owned by me, but Amazon might get a bit twitchy about me sharing it. I do however, also have some publicly available music uploaded which is also exhibiting the behaviour.

Would it help if I shared that directory with you?

Yes, then I can verify the UTF-8 encoding which might be a red-herring. I’m nick@craig-wood.com BTW!

Emailed. And fluff to exceed 20 chars:-).

Thanks. I’ve checked that out and I can see the UTF-8 encoding is standard (not OS X mangled), so that is a red herring.

So I think we are left with some unRAID compatibility problem…

I think so too. Let’s “close” this for now then (not sure how to close a forum post ;-)). And thanks for your time Nick.

Plan!
#[quote=“coliny, post:12, topic:392”]
not sure how to close a forum post :wink:
[/quote]

No, me neither!

You are welcome.