S3 failed upload large files bad request 400

I am having trouble uploading large files to S3. I have managed to sync all the pictures with no problems. The larger video files fail to upload. They get to nearly 100% then fail.

C:\Users\Karl>rclone --version

rclone v1.57.0

- os/version: Microsoft Windows 10 Pro 2009 (64 bit)

- os/kernel: 10.0.19043.1348 (x86_64)

- os/type: windows

- os/arch: amd64

- go/version: go1.17.2

- go/linking: dynamic

- go/tags: cmount

Sync to Amazon S3 bucket located in Ireland (I am located UK).

Sync script:

rclone copy F:\XXXXX\XXXXX\XXXXX\ S3:\\XXXXX\XXXXXX\ --progress --bwlimit 10M

Nb. I am using rclone encryption here.

Error with mp4 files. There is one error, but if I left the script to continue, each video would fail Same problems persists if I remove bwlimit.

S3 upload 400 bad request

Of note, the offending video file is 145 MB. I am under the allowable put request limit:

“You can send a PUT request to upload an object of up to 5 GB in a single operation. For more information, see the PutObject example in the AWS CLI Command Reference.” REF

My fibre bandwidth:

I’ve tried this on different days for a few months.

Next I tried multipart uploads with rclone. Used the rclone manual for recommended settings:

https://rclone.org/s3/

“Increasing --s3-upload-concurrency will increase throughput (8 would be a sensible value) and increasing --s3-chunk-size also increases throughput (16M would be sensible). Increasing either of these will use more memory. The default values are high enough to gain most of the possible performance without using too much memory. ”

rclone copy F:\XXXX\XXXXX\XXXXX\ S3:\\XXXXXX\XXXX\ --progress --s3-chunk-size 16M --s3-upload-concurrency 8

This resulted in the same errors. I left it running for a while and the script reported fail uploads of 5 files (all video mp4 files)

Is there anything I am missing here?

I've never seen a RequestTimeout before... What is your internet connection like? Does it go through a proxy/firewall which could be dropping connections?

Your upload speed appears to be about 700KiB/s which, while not very fast should be fast enough at about 5.5 Mbit/s.

Can you try this with the latest beta which has slightly different error handling.

Can you also try this which attempts to retry those errors explicitly (I thought the SDK was doing this for us but maybe it isn't)

v1.58.0-beta.5910.f4e5d0c34.fix-s3-timeout on branch fix-s3-timeout (uploaded in 15-30 mins)

1 Like

Thank you for replying. I must say that rclone is my favourite CLI tool and used for all my off-site cloud back-up. The built-in encryption tool is great. I am happy to be able to contribute to the project reporting this problem.

As suggested, I have tried the latest beta release:

rclone v1.58.0-beta.5910.b91c349cd
- os/version: Microsoft Windows 10 Pro 2009 (64 bit)
- os/kernel: 10.0.19043.1348 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.17.3
- go/linking: dynamic
- go/tags: cmount

With command:

rclone copy F:\XXXX\XXXX\XXX\ S3:\\XXXXXX\XXXX\ --progress --s3-chunk-size 16M --s3-upload-concurrency 8 --log-file F:\rclone.log --log-level DEBUG

Here's the result:

Success: (however failed again later in post) So far, it's uploading with no problems. I did notice upload speed has increased from between 500 to 1010 KiB/s

Just to be sure, I used the latest stable version of rclone.

C:\XXX\Karl>rclone --version
rclone v1.57.0
- os/version: Microsoft Windows 10 Pro 2009 (64 bit)
- os/kernel: 10.0.19043.1348 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.17.2
- go/linking: dynamic
- go/tags: cmount

With command:

rclone copy F:\XXXX\XXXX\XXX\ S3:\\XXXXXX\XXXX\ --progress --s3-chunk-size 16M --s3-upload-concurrency 8 --log-file F:\rclone.log --log-level DEBUG

Failed: The error occurred. I caught more information with the debugger.

2021/11/24 18:41:47 DEBUG : Encrypted drive 'S3://XXXXXX/XXXX/': Waiting for transfers to finish
2021/11/24 18:42:23 ERROR : Camera/2016-09-23 07.58.30.mp4: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>RK921GADFR5KF5A</RequestId><HostId>4blKLJ/Ndrd2QZxi7r6kbFRGt336REp8pOpeA9FFhD2s2PsdFgHHs+i/8pRWUZff99FEoT0=</HostId></Error>

HOWEVER I tried again with the beta release to leave it for a few hours and had the error again. Perhaps it's certain files.

Here's the log file output:

2021/11/24 19:00:43 DEBUG : pacer: low level retry 1/1 (error Put "https://XXXXXXX.s3.eu-west-1.amazonaws.com/2016/Camera/2016-10-01%2012.35.48.mp4.bin?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXX&X-Amz-Date=20211124T185830Z&X-Amz-Expires=900&X-Amz-SignedHeaders=content-md5%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-mtime%3Bx-amz-storage-class&X-Amz-Signature=XXX": write tcp 10.1.1.1:58921->52.218.53.251:443: wsasend: An existing connection was forcibly closed by the remote host.)
2021/11/24 19:00:43 DEBUG : pacer: Rate limited, increasing sleep to 10ms
2021/11/24 19:00:43 DEBUG : Camera/2016-10-01 12.35.48.mp4: Received error: Put "https://XXXXXX.s3.eu-west-1.amazonaws.com/2016/Camera/2016-10-01%2012.35.48.mp4.bin?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXAmz-Date=20211124T185830Z&X-Amz-Expires=900&X-Amz-SignedHeaders=content-md5%3Bcontent-type%3Bhost%3Bx-amz-acl%3Bx-amz-meta-mtime%3Bx-amz-storage-class&X-Amz-Signature=XXXXX": write tcp 10.1.1.1:58921->52.218.53.251:443: wsasend: An existing connection was forcibly closed by the remote host. - low level retry 1/10

As for the internet: I must admit I do not always get the best connection. The symptoms are buffering videos at time, but generally never disconnects. Whilst trying these back-ups connect is good, with no problems. I regularly back-up documents and pictures to S3 with no problems (I have reviewed log files).

Firewall is Microsoft Windows firewall, and usual built-in consumer router firewalls. No proxy servers.

1 Like

Thanks for testing.

Can you give this one a go too?

1 Like

Thanks for latest beta build.

I have downloaded the latest beta and got the same error.
This may not be an rclone issue.

Example:

c:\XXX\rclone-v1.58.0-beta.5910.f4e5d0c34.fix-s3-timeout-windows-amd64>rclone copy F:\XXXX\ S3:\\XXXXX\ --progress --s3-chunk-size 16M --s3-upload-concurrency 8
2021-11-25 16:17:39 ERROR : 2016-10-09 18.10.50.mp4: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>XXX</RequestId><HostId>XXX</HostId></Error>
Transferred:      594.013 MiB / 9.580 GiB, 6%, 801.396 KiB/s, ETA 3h16m15s
Errors:                 1 (retrying may help)
Checks:              8138 / 8138, 100%
Transferred:            3 / 141, 2%
Elapsed time:     11m25.7s
Transferring:
 *                       X.mp4: 87% /174.676Mi, 180.782Ki/s, 2m0s
 *                       X.mp4: 81% /111.492Mi, 203.792Ki/s, 1m43s
 *                       X.mp4:  7% /167.394Mi, 222.561Ki/s, 11m55s
 *                       X.mp4:  9% /153.732Mi, 187.604Ki/s, 12m37s

The above failed file (2016-10-09 18.10.50.mp4) is 132 MB.

I repeat these steps on my linux server using latest stable rclone (I believe it's a few versions behind) and latest beta build. Perhaps I will try another client to see if I can reproduce this error.

I will need to plan this so may take a few days.

Not sure if this website is any good, but my latency to AWS Ireland is between100 ms to 800 ms.

I will get back to the forum when I have more data.

Hi Karl,

You could also check if you have periods with (significant) packet loss or latency by running this command for some time (or while you do your rclone copy):

ping s3.eu-west-1.amazonaws.com -t -l 1000

If so, then you may be able to locate the hop where the packets are lost/delayed with this command:

tracert s3.eu-west-1.amazonaws.com

You may be able to find a more specific endpoint on this page: https://docs.aws.amazon.com/general/latest/gr/s3.html

2 Likes

What that beta should do is retry the failure. The failure will still happen as that is part of your internet connection I think, but it should retry it.

I'm not sure that actually worked though looking at your logs.

1 Like

Thanks Ole, that's great advice.

Here's my output:

ping s3.eu-west-1.amazonaws.com -t -l 1000
Ping statistics for 52.218.1.147:
    Packets: Sent = 56, Received = 52, Lost = 4 (7% loss),
Approximate round trip times in milli-seconds:
    Minimum = 23ms, Maximum = 34ms, Average = 24ms

Second command

tracert s3.eu-west-1.amazonaws.com

Outputs:

Tracing route to s3.eu-west-1.amazonaws.com [52.218.80.210]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  10.X.X.X
  2     5 ms     1 ms     1 ms  dsldevice.lan [192.168.1.254]
  3     5 ms     5 ms     5 ms  172.XX.XX.XX
  4     *        *        *     Request timed out.
  5    12 ms    12 ms    11 ms  132.hiper04.sheff.dial.plus.net.uk [195.166.143.132]
  6    12 ms    11 ms    11 ms  peer7-et-0-1-4.telehouse.ukcore.bt.net [109.159.252.94]
  7    12 ms    16 ms    11 ms  109.159.253.121
  8     *        *        *     Request timed out.
  9     *        *        *     Request timed out.
 10     *        *        *     Request timed out.

I will have to look into the tracert command to understand the request time outs and have a play round.

Yes of course I missed that (fix was just for retry). Regarding the failed retry, when do the retires occur? I stopped the script after the error, so have I prevented the retry?

for tracert, it is common to have multiple Request timed out

am i correct that your internet connetion is dsl?

some tools i have used for many years when i was doing tech support for voip, is
the windows built-in tool https://en.wikipedia.org/wiki/PathPing
and
https://sourceforge.net/projects/winmtr/
"WinMTR is a free MS Windows visual application that combines the functionality of the traceroute and ping in a single network diagnostic tool"

1 Like

Your round trips are fine and your packet loss seems a bit high - unless your pushed your connection to throttle. This is what I see from Copenhagen:

Ping statistics for 52.218.21.146:
    Packets: Sent = 52, Received = 52, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 30ms, Maximum = 36ms, Average = 30ms
1 Like

7% packet loss to an AWS endpoint is very bad in my opinion! There is something up with your Internet connectivity IMHO.

Your traceroute doesn't show where the problem is though - those rows with all * are probably normal. What you need to do is run it long enough so that you see a * in a row where there are numbers as well - that will be the problem hop.

If you run with -vv you'll see DEBUG messages about low level retries - are you seeing any of those?

1 Like

Thank you for the suggestion. I have been testing my connection with this tool.

Through my tests I am getting high packet loss. I will take this to ISP to see if they can troubleshoot.

Here's all my testing outputs if anyone can find any insights. If @ncw is happy, I can make this has solved as this is not a problem with rclone tool.

Ping aws eu-west servers

Ping statistics for 52.218.100.83:
	Packets: Sent = 78, Received = 65, Lost = 13 (16% loss),
Approximate round trip times in milli-seconds:
	Minimum = 23ms, Maximum = 41ms, Average = 25ms

Test internal network ping (192.168.1.254)

Ping statistics for 192.168.1.254:
	Packets: Sent = 115, Received = 115, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
	Minimum = 1ms, Maximum = 26ms, Average = 5ms

Ping to forum.rclone.org

Ping statistics for 31.25.187.150:
	Packets: Sent = 104, Received = 96, Lost = 8 (7% loss),
Approximate round trip times in milli-seconds:
	Minimum = 13ms, Maximum = 30ms, Average = 15ms

Ping to google.com

Ping statistics for 172.217.169.14:
	Packets: Sent = 58, Received = 53, Lost = 5 (8% loss),
Approximate round trip times in milli-seconds:
	Minimum = 12ms, Maximum = 26ms, Average = 14ms

Again, thank you for the WinMTR tool it's great.

image

image

image

With options
image

Perfect, it looks like your packet loss happens on the connection between dsldevice.lan and 172.16.18.21.

Try these these two commands and let them do minimum 100 pings each:

ping dsldevice.lan -t -l 1000
ping 172.16.18.21 -t -l 1000

Do you know what these two intermediate devices are? Are they yours or part of your ISP’s infrastructure?

If you are to report to your ISP, then do a similar ping report to your own last device (typically your dsl modem) to prove your own network is OK. I guess this is the command (or maybe dsldevice.lan):

ping 10.0.0.1 -t -l 1000

and then a ping to the homepage of your provider (it will travel entirely in their own net). I guess this is the command:

ping www.bt.com -t -l 1000    # owner of www.plus.net

If it doesn’t respond at all, then try pinging a major local news site, e.g.:

ping bbc.co.uk -t -l 1000

This will (hopefully) prove that packets are lost while being handled by your ISP.

You may also be able to login into your dsl modem and find a diagnostic menu that will allow you to send a series of ping to your providers homepage. If you can prove packet loss from the dsl modem they provided to their own homepage, then your have extremely solid proof.

1 Like

image

From my tests, packages are being dropped after they leave my network. I have tested individual devices internally. I do not know what 172.16.18.21 is (I believe this is outside my network, but is using an internal IP range. All devices being tested and connected via Cat5e (solid line in diagram).

Ping dsldevice.lan (gateway modem router)

Ping statistics for 192.168.1.254:
    Packets: Sent = 286, Received = 286, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 20ms, Average = 3ms

Ping 172.16.18.21

Ping statistics for 172.16.18.21:
    Packets: Sent = 22, Received = 0, Lost = 22 (100% loss)

Each external website I try I have been getting request time outs.

Many thanks for looking into this and providing support. I will raise this with my ISP and give them the evidence I have collected.

1 Like

No problem, my/our pleasure.

The solution may be to pin your dsl connection at a lower (but more stable) speed.
This may actually improve your throughput and experienced response times :slight_smile:

1 Like

i know this is real basic, i would reboot all the network equipment, update all firmware.

tho i do not fully understand your setup based on that image.
looks like double nat.

if you want to do further testing - about the the dsldevice.lan, is it

  • a dsl modem+router, where the wan is POTS rj-11 or equivalent phone cable.
    or
  • a standard router, where the wan is a rj-45 jack.
    if true, then i would replace the dsldevice.lan with your desktop pc and test that way.
1 Like

Good input, sometimes we tend to forget the basics.

Inspired by Jojo’s list:

If the wan is POTS rj-11 or equivalent, then test with your PC connected directly to dsldevice.lan to eliminate any possibility of nat or routing errors in your own network.

That is, test with the simplest possible setup after the signal leaves the equipment provided by your ISP.

2 Likes

thanks, i had written about that but somehow it did not post...

2 Likes

I have removed the second router. It already had the latest firmware but from 2015 I believe. This device is old and was used to give me more network options (control over DNS, own network IP range, etc.) as ISP routers are cheaper and have fewer configuration options. The main purpose was to provide excellent WiFi but since then I have a dedicated access point so disabled the WiFi on the device.

I have reconfigured the network without the second router now so effectively the desktop PC plugs into the switch and onward to the ISP router. I have noticed (believe I have) an increased performance of the network; it's the weekend so I will see how this performance continues into the week. It was this comment that made be realise that the hardware is old and my struggle with the new fibre speeds.

I am still getting the dropped packets though. So I will continue to raise this with my ISP. Not sure if a firewall along the way is blocking these pings.

Thanks again.

1 Like

good, we are making progress.

i have turned dozens of dsl modem+routers into a straight thru bridge mode.
in effect, removing the router part so the dsl device outputs a public ip address.

tho in you case, there does not seem to be a public ip, just that 172.16.18.21.
which i guesstimate to be some sort of carrier-grade nat.

but still, from the isp, i would ask to turn that modem+router into a bridge and use a good router