I’ve been using Rclone for some weeks to backup some websites from ftp to s3. Thanks for the great work on this project! Today I ran into an error when syncing one of my websites. The logs show the following error:
2018/03/16 17:28:44 ERROR : wp-content/uploads/43-LKW-transporte-Dã¤nemark-280x80.jpg: Failed to copy: InvalidURI: Couldn’t parse the specified URI.
status code: 400, request id: 80D97B1F429F8404, host id: UURsrFhQQDkCejofB9/z3i2kCBxvOYHg1Dy4IrFbYTBq9wyKeTP+DV1NpkWMqOw8i0uoX5aTx0o=
I think it might have something to do with the encoding of the filename.
I just tested syncing with the latest beta (v1.40-012-g0ed0d9a7), however I still get the same error (Windows & Unix).
NB: when connecting to the FTP server via a traditional FTP client, I get the error: “Invalid character sequence received, disabling UTF-8.”, and the filename displays as 43-LKW-transporte-Dã¤nemark-280x80.jpg.
Sorry for the late reply. Changing the locale of the FTP servers is not really an option, as we don’t control these servers. Today I ran into a file named Transport-Roemeni*600x235.jpg where * denotes the bytes \xE3\xAB one some FTP server. This filename URL-encodes to Transport-Roemeni%E3%AB-600x235.jpg, and thus S3 rightfully complains that it cannot decode this string to a valid UTF-8 string yielding response containing <Error><Code>InvalidURI</Code><Message>Couldn't parse the specified URI.</Message><URI>.../Transport-Roemeni%E3%AB-600x235.jpg</URI> etc.
The files with invalid UTF-8 names make up a very small portion of the files I’m trying to transfer, hence I was wondering whether I could ignore these specific errors as a workaround. The option --ignore-errors would be overkill, as I wouldn’t like to ignore all I/O errors.
If you wanted to do something today to patch the problem then you want to fix up the remote string here: https://github.com/ncw/rclone/blob/master/backend/sftp/sftp.go#L562 - adding remote = string([]rune(remote)) just after that line should do it. That will replace any invalid utf-8 with the invalid unicode string. This isn't a particularly good fix, but might get you going if you are desperate!
Also, I'm seeing some Transferred amount, but re-running rclone a second time shows the same amount. Does rclone attempt to transfer the files from sftp, counts the file sizes towards the Transferred, but then fails just as it tries to get the files because of the utf8 names?
Maybe the best at this point would be to pipe the stderr, get the error file paths, and try using rsync to bring the files over to rename them..?