Using cloudflare dns causes rclone mounts speeds to drop to 0 every few seconds

What is the problem you are having with rclone?

Network usage drops to near 0 when using a rclone union remote with cloudflare servers in /etc/resolv.conf

What is your rclone version (output from rclone version)

v.51.0-148-g2a62471e-beta

Which OS you are using and how many bits (eg Windows 7, 64 bit)

ubuntu 18.04

Which cloud storage system are you using? (eg Google Drive)

google drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

ExecStart=/usr/bin/rclone mount \
  --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' \
  --config=/opt/rclone.conf \
  --allow-other \
  --fast-list \
  --vfs-read-chunk-size=10M \
  --vfs-read-chunk-size-limit=0 \
  --buffer-size=0 \
  --poll-interval=1m \
  --no-modtime \
  --drive-pacer-min-sleep=10ms \
  --dir-cache-time=24h \
  --timeout=10m \
  --umask=002 \
  --log-level INFO \
  --log-file /opt/rclone.log \
  union: /mnt/remote

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

nothing unusual in the logs

I'm sure it was the cloudflare DNS because as soon I reverted to my default dns, the drop in speeds stopped and I was able to reproduce it in 5 different servers. I don't know if I start to get rate limited by cloudflare or something but just sharing it if someone else have the same issue. I spent hours trying to find the source of this issue

That sounds like you have a peering issue from using one set of DNS servers to the other.

(Apologies if I am telling you something you know)

The way DNS works is it takes a name and resolves that to a numeric IP address. If you are pointing to CloudFlare's DNS, you'd get a different IP resolved back when using another DNS.

The host machine than talks to that IP address to communicate so one is slower than the other I'd surmise. I wouldn't think it would drop to 0 though as that seems a bit odd.

Can you share a debug log of the issue with pointing to cloudflare?

You can test with copying a file rather than using a mount if that makes it easier and see if you can reproduce it.

Looking at the output of the tracert www.googleapis.com with and without the cloudflare DNS would show the different network routing.

This is a dedicated server in a datacenter. So I didn't expect to have this kind of issues.

Should I use google DNS then? It makes sense it would speed up rclone right ?

Google will probably be better at geolocating their datacenters than cloudflare so it may work to speed up rclone!

Any way to benchmark this to be sure? I can run test on multiple datacenters locations to see if there is any performance gain

So I was running tracert with 1.1.1.1 DNS and 8.8.8.8 DNS to this address and difference between each of them when was seconds!

But then I noticed that when I use -n to not resolve ip address to hostnames then the difference is minimal.

So I don't know...

For me, both CloudFlare and Google DNS return the same server:

felix@gemini:~$ host www.googleapis.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:

www.googleapis.com has address 172.217.6.234
www.googleapis.com has IPv6 address 2607:f8b0:4006:815::200a
felix@gemini:~$ host www.googleapis.com 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:

www.googleapis.com has address 172.217.6.234
www.googleapis.com has IPv6 address 2607:f8b0:4006:81a::200a

The only way to benchmark it really is to use their APIs and test and see your performance. You can't really "fix" bad peering as your ISP has to address that if they'll even acknowledge it.

I generally get much better results with 8.8.8.8 instead of 1.1.1.1 as a resolver. This is on a server in a datacenter with private peering with Google.

darthshadow@server:~$ host googleapis.com 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:

googleapis.com has address 172.217.17.36
googleapis.com has IPv6 address 2a00:1450:400e:804::2004
darthshadow@server:~$ ping -n -c 4 172.217.17.36
PING 172.217.17.36 (172.217.17.36) 56(84) bytes of data.
64 bytes from 172.217.17.36: icmp_seq=1 ttl=53 time=0.828 ms
64 bytes from 172.217.17.36: icmp_seq=2 ttl=53 time=0.839 ms
64 bytes from 172.217.17.36: icmp_seq=3 ttl=53 time=0.853 ms
64 bytes from 172.217.17.36: icmp_seq=4 ttl=53 time=0.831 ms

--- 172.217.17.36 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3054ms
rtt min/avg/max/mdev = 0.828/0.837/0.853/0.036 ms
darthshadow@server:~$ host googleapis.com 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases:

googleapis.com has address 216.58.207.228
googleapis.com has IPv6 address 2a00:1450:400f:80c::2004
darthshadow@server:~$ ping -n -c 4 216.58.207.228
PING 216.58.207.228 (216.58.207.228) 56(84) bytes of data.
64 bytes from 216.58.207.228: icmp_seq=1 ttl=53 time=24.9 ms
64 bytes from 216.58.207.228: icmp_seq=2 ttl=53 time=24.9 ms
64 bytes from 216.58.207.228: icmp_seq=3 ttl=53 time=24.9 ms
64 bytes from 216.58.207.228: icmp_seq=4 ttl=53 time=24.9 ms

--- 216.58.207.228 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 24.957/24.972/24.997/0.112 ms

As an anecdotal data point I tried using the 1.1.1.3 filtered DNS at home (I have kids) but lots of services (in particular twitter and GitHub) were much slower than my ISPs filtered DNS server so I stopped using it.

I think cloudflare are losing some or all of the Geolocation info. Some DNS resolvers use your IP to work out which is the closest server to you and send you there. If cloudflare is doing the resolution for you then they are using cloudflare's IP not yours. I suspect cloudflare do the query with a server near you, but how near is the question?

In theory BGP anycast shouldn't be susceptible to this but I don't think all geo friendly stuff is done with it.

Since the DNS impacts so much on rclone performance, isn't it possible to do some optimizations on rclone side?

You guys were running the wrong benchmark...

Try running time traceroute www.googleapis.com with both 1.1.1.1 and 8.8.8.8 set in resolv.conf and you see that the difference between them is often seconds!

A traceroute just checks latency from you to the endpoint you are tracing to in terms of how many hops you are going through to get to the thing on the other side.

DNS just resolves a name to an IP and those are system things that any application or program asks the system it's running on for.

The reason you differences is how the DNS server you are pointing to answers your request as many do geographically responses based on where you are and give you different answers back based on your location. Some even filter out stuff for you and a myriad of other things.

More hops to something tend to equal more latency as you are have more devices to go through to get to there and it's one factor in determining how good or bad a route is. You as an end user cannot change how you get to something as your ISP does that. If you have a bad hop or a congested hop, things are going to be bad for you and you cannot change it other than complaining to the ISP or changing ISPs or using a VPN to take a different route (this has it's own capacity issues).

The challenge when offering blanket advice is that for me, your advise would not apply because both Google and CloudFlare resolve back to the same endpoint so there would be zero difference for me.

Others would find benefits as they may have better peering with Cloudflare or better peering with Google as it's all about testing.

I've probably spent far too much of my time / career troubleshooting network issues so it's something I have a passion for and enjoy.

The problem here is how the dns impacts rclone so much. Shouldn't it try to cache the dns requests or try to request less?

Why the DNS impacts rclone performance so much?

I think you are a little confused on how DNS operates as I didn't do a good job at explaining it.

When any application requests host name, DNS just translates that host name to a numeric IP address like taking a name to a phone number as application operate on numbers and not name.

The reason you see an issue is not the speed or keeping the result as the connections are already made.

The issue lies in the fact that different numbers are provided depending on the way the DNS server you are using gives you the answer for your phone number request.

You may have better call connection when using one number over another or better network peering if you want to talk networking.

There is not a thing rclone can do to influence that to make that better as it's all core operating system and networking and what works for you, might be bad for me as that's the pitfalls of the Internet and how things route to each other.

Still, when using rclone with 1.1.1.1 DNS the mount is unusable.

That shouldn't happen. A few more seconds of latency shouldn't make rclone unusable.

And I'm 100% sure it's the DNS because as soon I change it, the mount is usable again.

When you using CloudFlare, you have a bad peer.

It has nothing to do with rclone directly.

It's you, your ISP and how your ISP peers to the IP returned from DNS.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.