2024/02/01 10:59:45 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/01 10:59:45 ERROR : Attempt 1/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 1; INTERNAL_ERROR; received from peer
2024/02/01 10:59:45 ERROR : Attempt 2/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 3; INTERNAL_ERROR; received from peer
2024/02/01 10:59:45 ERROR : Attempt 3/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 5; INTERNAL_ERROR; received from peer
Additional info
I can download this file via browser, using curl and wget.
It does not, however, work on two ec-2 instances on AWS. I tried arm64 (r6g) and amd64 (t3). I'm getting stream error as pasted above. arm64 version is on the first post, amd64 is this:
To my understanding these instances are as "default" as it gets, with no specific configuration. I am able to download files from other locations, for example:
rclone copyurl https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_USA_shp.zip out
Maybe there is some kind of test suite I could run?
Sometimes things break when some network infrastructure is not configured properly (can be anywhere between your server and geodata.ucdavis.edu). Often related to HTTP2 support and IPv6 path configuration
As @kapitainsky noted above this is an HTTP2 error. It is from the server so the server looks unhappy for some reason.
The experiments with --bind mean that it isn't an IPv4 vs IPv6 issue.
The download works fine with and without --disable-http2 so it isn't an HTTP1 vs HTTP2 problem.
I think the remaining reasons could be
some kind of network proxy between you and the server - assuming you are just starting up a vanilla VM then this unlikely, but if you are in some kind of VPC then I guess it could be possible.
the server has banned downloads from AWS IPs for some reason (abuse maybe)
You might also want to experiment with setting a --user-agent to that used by a browser - that might help.
I created a brand new AWS account, and created a default EC2 instance (next, next, next..), installed rclone from https://rclone.org/install.sh script. Same failure. Version displays this:
[ec2-user@ip-172-31-18-212 ~]$ rclone copyurl https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip out --user-agent="MozilMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.3"
2024/02/02 14:01:24 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/02 14:01:24 ERROR : Attempt 1/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 1; INTERNAL_ERROR; received from peer
2024/02/02 14:01:24 ERROR : Attempt 2/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 3; INTERNAL_ERROR; received from peer
2024/02/02 14:01:24 ERROR : Attempt 3/3 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 5; INTERNAL_ERROR; received from peer
2024/02/02 14:01:24 Failed to copyurl: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 5; INTERNAL_ERROR; received from peer
IP is not blocked, as I can get that file with curl & wget:
[ec2-user@ip-172-31-25-145 ~]$ rclone copyurl https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip out -vv --dump headers --retries 1
2024/02/04 10:20:06 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "copyurl" "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip" "out" "-vv" "--dump" "headers" "--retries" "1"]
2024/02/04 10:20:06 DEBUG : Creating backend with remote "."
2024/02/04 10:20:06 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/04 10:20:06 DEBUG : fs cache: renaming cache item "." to be canonical "/home/ec2-user"
2024/02/04 10:20:06 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/02/04 10:20:06 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/04 10:20:06 DEBUG : HTTP REQUEST (req 0xc00090d200)
2024/02/04 10:20:06 DEBUG : GET /Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip HTTP/1.1
Host: www.nass.usda.gov
User-Agent: rclone/v1.65.2
Accept-Encoding: gzip
2024/02/04 10:20:06 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/04 10:20:06 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/04 10:20:06 DEBUG : HTTP RESPONSE (req 0xc00090d200)
2024/02/04 10:20:06 DEBUG : Error: stream error: stream ID 1; INTERNAL_ERROR; received from peer
2024/02/04 10:20:06 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/04 10:20:06 ERROR : Attempt 1/1 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 1; INTERNAL_ERROR; received from peer
2024/02/04 10:20:06 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Elapsed time: 0.0s
2024/02/04 10:20:06 DEBUG : 5 go routines active
2024/02/04 10:20:06 Failed to copyurl: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": stream error: stream ID 1; INTERNAL_ERROR; received from peer
[ec2-user@ip-172-31-25-145 ~]$
[ec2-user@ip-172-31-23-19 ~]$ rclone copyurl https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip out -vv --dump headers --retries 1 --disable-http2
2024/02/05 09:23:31 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "copyurl" "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip" "out" "-vv" "--dump" "headers" "--retries" "1" "--disable-http2"]
2024/02/05 09:23:31 DEBUG : Creating backend with remote "."
2024/02/05 09:23:31 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/05 09:23:31 DEBUG : fs cache: renaming cache item "." to be canonical "/home/ec2-user"
2024/02/05 09:23:31 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/02/05 09:23:31 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 09:23:31 DEBUG : HTTP REQUEST (req 0xc00094d200)
2024/02/05 09:23:31 DEBUG : GET /Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip HTTP/1.1
Host: www.nass.usda.gov
User-Agent: rclone/v1.65.2
Accept-Encoding: gzip
2024/02/05 09:23:31 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 09:24:31 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 1m0.0s
2024/02/05 09:25:31 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 2m0.0s
2024/02/05 09:26:31 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 3m0.0s
The second command (this time left for longer and it timed out)
[ec2-user@ip-172-31-23-19 ~]$ rclone copyurl https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip out -vv --dump bodies --retries 1 --disable-http2
2024/02/05 09:31:02 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "copyurl" "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip" "out" "-vv" "--dump" "bodies" "--retries" "1" "--disable-http2"]
2024/02/05 09:31:02 DEBUG : Creating backend with remote "."
2024/02/05 09:31:02 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/05 09:31:02 DEBUG : fs cache: renaming cache item "." to be canonical "/home/ec2-user"
2024/02/05 09:31:02 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/02/05 09:31:02 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 09:31:02 DEBUG : HTTP REQUEST (req 0xc00090b200)
2024/02/05 09:31:02 DEBUG : GET /Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip HTTP/1.1
Host: www.nass.usda.gov
User-Agent: rclone/v1.65.2
Accept-Encoding: gzip
2024/02/05 09:31:02 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 09:32:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 1m0.0s
2024/02/05 09:33:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 2m0.0s
2024/02/05 09:34:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 3m0.0s
2024/02/05 09:35:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 4m0.0s
2024/02/05 09:36:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 5m0.0s
2024/02/05 09:36:02 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/05 09:36:02 DEBUG : HTTP RESPONSE (req 0xc00090b200)
2024/02/05 09:36:02 DEBUG : Error: net/http: timeout awaiting response headers
2024/02/05 09:36:02 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/05 09:36:02 ERROR : Attempt 1/1 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": net/http: timeout awaiting response headers
2024/02/05 09:36:02 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Elapsed time: 5m0.0s
2024/02/05 09:36:02 DEBUG : 6 go routines active
2024/02/05 09:36:02 Failed to copyurl: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": net/http: timeout awaiting response headers
[ec2-user@ip-172-31-23-19 ~]$ rclone copyurl https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip out -vv --dump headers --retries 1 --disable-http2 --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
2024/02/05 15:09:34 DEBUG : rclone: Version "v1.65.2" starting with parameters ["rclone" "copyurl" "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip" "out" "-vv" "--dump" "headers" "--retries" "1" "--disable-http2" "--user-agent" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"]
2024/02/05 15:09:34 DEBUG : Creating backend with remote "."
2024/02/05 15:09:34 NOTICE: Config file "/home/ec2-user/.config/rclone/rclone.conf" not found - using defaults
2024/02/05 15:09:34 DEBUG : fs cache: renaming cache item "." to be canonical "/home/ec2-user"
2024/02/05 15:09:34 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/02/05 15:09:34 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 15:09:34 DEBUG : HTTP REQUEST (req 0xc000949200)
2024/02/05 15:09:34 DEBUG : GET /Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip HTTP/1.1
Host: www.nass.usda.gov
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept-Encoding: gzip
2024/02/05 15:09:34 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/02/05 15:10:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 1m0.0s
2024/02/05 15:11:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 2m0.0s
2024/02/05 15:12:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 3m0.0s
2024/02/05 15:13:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 4m0.0s
2024/02/05 15:14:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 5m0.0s
2024/02/05 15:14:34 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/05 15:14:34 DEBUG : HTTP RESPONSE (req 0xc000949200)
2024/02/05 15:14:34 DEBUG : Error: net/http: timeout awaiting response headers
2024/02/05 15:14:34 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/02/05 15:14:34 ERROR : Attempt 1/1 failed with 1 errors and: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": net/http: timeout awaiting response headers
2024/02/05 15:14:34 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Elapsed time: 5m0.0s
2024/02/05 15:14:34 DEBUG : 6 go routines active
2024/02/05 15:14:34 Failed to copyurl: Get "https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2023_30m_cdls.zip": net/http: timeout awaiting response headers
📎 psarka@xps:~$ host www.nass.usda.gov
www.nass.usda.gov is an alias for www.nass.usda.gov.edgekey.net.
www.nass.usda.gov.edgekey.net is an alias for e10552.dscx.akamaiedge.net.
e10552.dscx.akamaiedge.net has address 23.197.137.116
e10552.dscx.akamaiedge.net has IPv6 address 2a02:26f0:3900:3af::2938
e10552.dscx.akamaiedge.net has IPv6 address 2a02:26f0:3900:3a4::2938
ec2 instance (where it does not)
[ec2-user@ip-172-31-23-19 ~]$ host www.nass.usda.gov
www.nass.usda.gov is an alias for www.nass.usda.gov.edgekey.net.
www.nass.usda.gov.edgekey.net is an alias for e10552.dscx.akamaiedge.net.
e10552.dscx.akamaiedge.net has address 23.6.101.171
e10552.dscx.akamaiedge.net has IPv6 address 2600:1409:9800:985::2938
e10552.dscx.akamaiedge.net has IPv6 address 2600:1409:9800:989::2938
Netstat confirms that rclone is connecting to the address returned by host:
[ec2-user@ip-172-31-23-19 ~]$ sudo netstat -tuanp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2162/sshd: /usr/sbi
tcp 0 540 172.31.23.19:22 18.237.140.164:31818 ESTABLISHED 68019/sshd: ec2-use
tcp 0 0 172.31.23.19:33842 52.119.167.123:443 ESTABLISHED 2157/amazon-ssm-age
tcp 0 0 172.31.23.19:22 138.2.234.220:48850 TIME_WAIT -
tcp 0 0 172.31.23.19:58744 23.6.101.171:443 ESTABLISHED 69516/rclone
tcp6 0 0 :::22 :::* LISTEN 2162/sshd: /usr/sbi
udp 0 0 127.0.0.1:323 0.0.0.0:* 2191/chronyd
udp 0 0 172.31.23.19:68 0.0.0.0:* 1966/systemd-networ
udp6 0 0 ::1:323 :::* 2191/chronyd
udp6 0 0 fe80::17:97ff:fe61::546 :::* 1966/systemd-networ
wget is using the same IP and downloading successfully: