InvalidUri errors


#1

I’ve been using Rclone for some weeks to backup some websites from ftp to s3. Thanks for the great work on this project! Today I ran into an error when syncing one of my websites. The logs show the following error:
2018/03/16 17:28:44 ERROR : wp-content/uploads/43-LKW-transporte-Dã¤nemark-280x80.jpg: Failed to copy: InvalidURI: Couldn’t parse the specified URI.
status code: 400, request id: 80D97B1F429F8404, host id: UURsrFhQQDkCejofB9/z3i2kCBxvOYHg1Dy4IrFbYTBq9wyKeTP+DV1NpkWMqOw8i0uoX5aTx0o=

I think it might have something to do with the encoding of the filename.


#2

It would be worth trying the latest beta to see if the issue is still present.

Also it looks like from your paste that one of your computers isn’t set to UTF-8 locale - that could also be the problem.


#3

I just tested syncing with the latest beta (v1.40-012-g0ed0d9a7), however I still get the same error (Windows & Unix).

NB: when connecting to the FTP server via a traditional FTP client, I get the error: “Invalid character sequence received, disabling UTF-8.”, and the filename displays as 43-LKW-transporte-Dã¤nemark-280x80.jpg.


#4

Ah, so you think the FTP server is dishing out invalid UTF-8?

That would make sense of the errors.

Can you change the locale of the FTP server so it runs in UTF-8 - that will make rclone’s life easier. rclone works entirely in UTF-8.


#5

Sorry for the late reply. Changing the locale of the FTP servers is not really an option, as we don’t control these servers. Today I ran into a file named Transport-Roemeni*600x235.jpg where * denotes the bytes \xE3\xAB one some FTP server. This filename URL-encodes to Transport-Roemeni%E3%AB-600x235.jpg, and thus S3 rightfully complains that it cannot decode this string to a valid UTF-8 string yielding response containing <Error><Code>InvalidURI</Code><Message>Couldn't parse the specified URI.</Message><URI>.../Transport-Roemeni%E3%AB-600x235.jpg</URI> etc.

The files with invalid UTF-8 names make up a very small portion of the files I’m trying to transfer, hence I was wondering whether I could ignore these specific errors as a workaround. The option --ignore-errors would be overkill, as I wouldn’t like to ignore all I/O errors.


#6

In the local backend rclone replaces invalid utf-8 with the broken utf-8 symbol. Is that the sort of work-around which would be useful?