InvalidUri errors

I’ve been using Rclone for some weeks to backup some websites from ftp to s3. Thanks for the great work on this project! Today I ran into an error when syncing one of my websites. The logs show the following error:
2018/03/16 17:28:44 ERROR : wp-content/uploads/43-LKW-transporte-Dã¤nemark-280x80.jpg: Failed to copy: InvalidURI: Couldn’t parse the specified URI.
status code: 400, request id: 80D97B1F429F8404, host id: UURsrFhQQDkCejofB9/z3i2kCBxvOYHg1Dy4IrFbYTBq9wyKeTP+DV1NpkWMqOw8i0uoX5aTx0o=

I think it might have something to do with the encoding of the filename.

It would be worth trying the latest beta to see if the issue is still present.

Also it looks like from your paste that one of your computers isn’t set to UTF-8 locale - that could also be the problem.

I just tested syncing with the latest beta (v1.40-012-g0ed0d9a7), however I still get the same error (Windows & Unix).

NB: when connecting to the FTP server via a traditional FTP client, I get the error: “Invalid character sequence received, disabling UTF-8.”, and the filename displays as 43-LKW-transporte-Dã¤nemark-280x80.jpg.

Ah, so you think the FTP server is dishing out invalid UTF-8?

That would make sense of the errors.

Can you change the locale of the FTP server so it runs in UTF-8 - that will make rclone’s life easier. rclone works entirely in UTF-8.

Sorry for the late reply. Changing the locale of the FTP servers is not really an option, as we don’t control these servers. Today I ran into a file named Transport-Roemeni*600x235.jpg where * denotes the bytes \xE3\xAB one some FTP server. This filename URL-encodes to Transport-Roemeni%E3%AB-600x235.jpg, and thus S3 rightfully complains that it cannot decode this string to a valid UTF-8 string yielding response containing <Error><Code>InvalidURI</Code><Message>Couldn't parse the specified URI.</Message><URI>.../Transport-Roemeni%E3%AB-600x235.jpg</URI> etc.

The files with invalid UTF-8 names make up a very small portion of the files I’m trying to transfer, hence I was wondering whether I could ignore these specific errors as a workaround. The option --ignore-errors would be overkill, as I wouldn’t like to ignore all I/O errors.

In the local backend rclone replaces invalid utf-8 with the broken utf-8 symbol. Is that the sort of work-around which would be useful?

Hey, bumping this as I'm hitting the same error transferring from an SFTP source to S3:

<Message>Couldn't parse the specified URI.</Message>

I also don't have control over the SFTP source server. I tried the latest beta (rclone v1.48.0-002-g62853036-beta) to no avail.

Is there any workaround if I want to transfer the files today (I would prefer not to ignore the errors, but actually backup the files).

Alternatively, would it be a large effort to have rclone replace the invalid utf-8 characters, just as it does in the local backend?

Thank you very much! rclone is an amazing piece of software and I'm very very grateful for it!!

Cheers

Greg

There is a long running pull request which will fix this (and many more problems): https://github.com/ncw/rclone/pull/3148

If you wanted to do something today to patch the problem then you want to fix up the remote string here: https://github.com/ncw/rclone/blob/master/backend/sftp/sftp.go#L562 - adding remote = string([]rune(remote)) just after that line should do it. That will replace any invalid utf-8 with the invalid unicode string. This isn't a particularly good fix, but might get you going if you are desperate!

Thank you very much for taking the time to look into this! I applied the patch you proposed but am now getting

Failed to copy: failed to open source object: Open failed: file does not exist

errors for the file with utf8 characters.

I gather that the https://github.com/ncw/rclone/pull/3148 patch does not support sftp, so it's not worth trying it?

Also, I'm seeing some Transferred amount, but re-running rclone a second time shows the same amount. Does rclone attempt to transfer the files from sftp, counts the file sizes towards the Transferred, but then fails just as it tries to get the files because of the utf8 names?

Maybe the best at this point would be to pipe the stderr, get the error file paths, and try using rsync to bring the files over to rename them..?

Thanks again!

Thinking a bit more about that patch, that is probably to be expected :frowning:

It does support s3 though, so it is probably worth trying.

It is s3 that is failing so depending on exactly where in the process that happens, rclone may have already uploaded the file to s3.

You could use rclone to fetch the files locally then run convmv to fix the encoding then upload?