Rclone DNS name resolution issues under FreeNAS 11.1 (FreeBSD 11.1-STABLE variant)


#1

Hello everyone,

I’m having what I interpret as DNS name resolution issues under FreeNAS 11.1 (based on FreeBSD 11.1-STABLE), with both rclone v1.38 (which comes already installed on FreeNAS) and v1.40 (which I downloaded from the rclone web repo and installed myself).

The problem is somewhat intermitent, but when it manifests itself, all rclone commands (not only the lsd shown below) fail like this:

rclone lsd MYGOOGLEDRIVE:
     2018/04/06 12:03:35 Failed to create file system for "MYGOOGLEDRIVE:": couldn't read info about Drive: Get https://www.googleapis.com/drive/v2/about?alt=json: Post https://accounts.google.com/o/oauth2/token: dial tcp: lookup accounts.google.com on 172.20.25.99:53: read udp 172.20.25.99:19427->172.20.25.99:53: i/o timeout

172.20.25.99 is the internal DNS server, which works perfectly well with everything else, to whit:

host accounts.google.com 172.20.25.99
Using domain server:
Name: 172.20.25.99
Address: 172.20.25.99#53
Aliases:

accounts.google.com has address 216.58.202.141
accounts.google.com has IPv6 address 2800:3f0:4001:809::200d
accounts.google.com mail is handled by 10 alt1.gmr-smtp-in.l.google.com.
accounts.google.com mail is handled by 5 gmr-smtp-in.l.google.com.
accounts.google.com mail is handled by 30 alt3.gmr-smtp-in.l.google.com.
accounts.google.com mail is handled by 40 alt4.gmr-smtp-in.l.google.com.
accounts.google.com mail is handled by 20 alt2.gmr-smtp-in.l.google.com.

Moreover, demonstrating that the problem is indeed DNS name resolution, as soon as I add 216.58.202.141 accounts.google.com plus whatever rclone complains next www.googleapis.com, etc) to the machine’s /etc/hosts file, rclone then works perfectly.

So, am I missing anything? Is this an rclone bug or what?

Thanks,
– Durval.


#2

What software is that name server running? I’ve had a lot of problems with the systemd name resolver.

Can you try a different name resolver and see if that fixes the problem (eg 8.8.8.8 or 1.1.1.1) then at least we’ll know where to start looking.

You may find this issue interesting: https://github.com/ncw/rclone/issues/683

Which contains a possible work-around: export GODEBUG=netdns=go


#3

Hello @ncw,

This machine is a Samba AD domain controller, so it has to use the internal Samba DC DNS (which in turn forwards any queries not ending in “.MYDOMAIN.local” to a BIND-based DNS server).

Can you try a different name resolver and see if that fixes the problem (eg 8.8.8.8 or 1.1.1.1) then at least we’ll know where to start looking.

Is there a way to do that just for rclone (some environment variable, etc)? I can’t change the system-wide resolver via /etc/resolv.conf or Samba will break down badly… :-/

You may find this issue interesting: https://github.com/ncw/rclone/issues/683
Which contains a possible work-around: export GODEBUG=netdns=go

Thanks for the tips. I did also find this: https://golang.org/pkg/net/#hdr-Name_Resolution

Do you think it’s relevant? Could export GODEBUG=netdns=cgo also help? Would either of those possibly bring other problems or side-effects to rclone’s operation?

Thanks!
– Durval.


#4

I see!

Not as far as I know, unless you put it in a docker container or something like that.

I think it is worth having an experiment with those options.

Note that my builds are built without cgo so don’t have access to the system resolver - it might be worth building your own rclone on the FreeNAS which if you just use go build to build it will have access to the system resolver.

I don’t think so. They’ll either help with the name resolution or they won’t.


#5

Howdy @ncw,

I’m happy to report that export GODEBUG=netdns=go seems to have fixed the issue!

And not only on the above FreeBSD/FreeNAS/AD nameserver setup, but also on another very different situation (domestic laptop, running Linux Mint 17.x amd64 with “nameserver 8.8.8.8” in /etc/resolv.conf) where the problem was apparently caused by some weird “DNS captive portal” being done by the upstream ISP; in both cases rclone was stuck in dial [...] timeout errors, and after just exporting the above envvar and rerunning the same command, everything started working perfectly.

Thanks again Nick for the great assistance, and the great program that is rclone!

Cheers,
– Durval.


#6

Excellent news :smiley:

I would have thought that would be the default unless you built rclone yourself?

If you do

$ ldd `which rclone`
not a dynamic executable

And it prints the above not a dynamic executable then I think go doesn’t have access to the other resolver, only the go one.


#7

Hello Nick,

This is weird, as I definitely did not build any of these (FreeNAS/FreeBSD or Linux) rclones.

Checking them with ldd:

  • FreeNAS/FreeBSD:
    ldd /mnt/REDACTED/bin/rclone
        ldd: /mnt/REDACTED/bin/rclone: not a dynamic ELF executable
  • Linux:
    ldd /usr/local/bin/rclone
            not a dynamic executable

In fact, the issue seemed to be intermittent on both machines (it suddenly started happening, sometimes even in the midle of a long sync/copy) and I’ve already seen they “go away” without explanation. So, it is possible the above export worked as just a “placebo” for the operator (me) and the problem is still there, just haven’t manifested itself again yet… :frowning:

Well, I will keep on rcloneing and post a followup if (most probably when, by what you just said) the issue happens again.

In the meantime, if you have any more info/tips/suggestions/recommendations right off the bat, I’m all ears :wink:

Cheers,
– Durval.


#8

One thing you can do is use this to get the go runtime to tell you which resolver package it is using

Here is the release code

$ GODEBUG=netdns=1 rclone-v1.40 lsd drive: >/dev/null
go package net: built with netgo build tag; using Go's DNS resolver

And here is one I built locally

$ GODEBUG=netdns=1 rclone lsd drive: >/dev/null
go package net: dynamic selection of DNS resolver

Not at the moment! I’ve seen a lot of niggly low level problems with Go’s name resolution (hence the FAQ entry) and I don’t really know why!


#9

Hello again, @ncw:

Here it is, for both machines;

  • FreeNAS/FreeBSD:
       GODEBUG=netdns=1 /mnt/REDACTED/bin/rclone lsd MYGOOGLEDRIVE1: >/dev/null
           go package net: built with netgo build tag; using Go's DNS resolver
  • Linux:
      GODEBUG=netdns=1 /usr/local/bin/rclone lsd MYGOOGLEDRIVE2: >/dev/null
          go package net: built with netgo build tag; using Go's DNS resolver

Possibly relevant info: rclone's version on the Linux machine is still v1.39 (unlike the FreeNAS/FreeBSD machine, I haven’t recently upgraded it).

Cheers,
– Durval.


#10

That is what I’d expect which probably means the GODEBUG=netdns=go is not doing anything useful :frowning:

v1.39 was compiled with go1.9 whereas v1.40 was compiled with go1.10 so it will have any improvements to the go name resolver code in it.


#11

Hello Nick,

Thanks for the clarifications. I will proceed as per your guidance, possibly compiling a custom rclone build here (is there any to be had already compiled?) in order to enable the other resolvers, and let you know how it goes.

Cheers,
– Durval


#12

It easy to cross compile go, but not if you want cgo (and hence the other resolvers) support.

To compile for FreeBSD/linux the easiest way would be to install go and install rclone from source…

I can send you a build for linux very easily if you want - I don’t have a FreeBSD VM to hand at the moment though.


#13

Hello @ncw,

EDIT: just to clarify, the issue I’m discussing below and in the next few messages, while related to DNS, is subtly different: the previous issue (reported above) is that rclone could not get an answer from the DNS server. The issue now is that it does get an answer, but it when it contains an IPv6 address, rclone tries to connect to it and fails (as I have no IPv6 connectivity in my rclone clients).

Long time since this issue last happened with me, but now it has started biting again :frowning: And in linux, not FreeBSD, this time.

I was thinking about your suggestion (recompiling rclone so as to enable the dynamic DNS resolver selection) and I would like to avoid it if possible; not only because that would require me to compile and keep separate (ie, not-downloaded-from-the-official-location) versions of rclone around, but also because it would mean doing it on a multitude of machines.

I’ve been researching the subject and just came upon what seems a more workable solution: to reconfigure my local DNS servers (which are all BIND9+) so as to block AAAA records from any responses to clients. Then all my machines (which are already DNS clients of these servers) would automatically stop seeing any AAAA records, and so would rclone (and other programs which have also been giving me IPv6 issues).

This is very easily accomplished, it suffices to add a filter-aaaa-on-v4 yes; to the options block of /etc/named.conf (or whatever file is your BIND named master config file). Reference docs from BIND on this can be found here: https://kb.isc.org/article/AA-00576/0/Filter-AAAA-option-in-BIND-9-.html

An added benefit of doing it this is that, when my ISP here finally provides working IPv6, I don’t have to change anything on the clients, just remove the above option in the BIND config file and restart it.

What do you think? Would this be a good solution? Or do you foresee any issues (with rclone or otherwise)?

Cheers, – Durval


#14

I take it the problem is that rclone will choose IPv6 connectivity by default, but that doesn’t work 100% reliably? Is that it?

You should be able to configure your preferences in /etc/gai.conf - I don’t know whether the standard build of rclone will obey that or not but it might be worth a try.

Otherwise your BIND solution seems like like a reasonable one if you really don’t want IPv6 any more! As there are very very few IPv6 only services (can’t think of any except for test things) I don’t see a downside.


#15

Hello @ncw,

I take it the problem is that rclone will choose IPv6 connectivity by default, but that doesn’t work 100% reliably? Is that it?

IPv6 ‘works’ 100% reliably on my current setup: as my ISP doesn’t offer it, it fails every damn time :wink: What isn’t “reliable” is whether rclone uses the IPv6 address or not; I’m not sure whether Google’s DNS servers won’t always return IPv6 addresses or not, or if it’s rclone not picking them by default every time, but whatever it is, it makes the end result very unreliable: even after I remove IPv6 completely from the machine (eg, echo 1 > /proc/sys/net/ipv6/conf/<interface-name>/disable_ipv6) , rclone sometimes starts trying to connect to Google servers’ IPv6 address… and in the middle of a large sync/copy/cryptcheck, it screws the entire process… :frowning:

Nice reminder about /etc/gai.conf; I was not aware that anything used it these days :wink: here on my machines it either doesn’t exist (FreeBSD11), is a zero-byte file (EL6) or is full of comments (Devuan)… but I will have a look and create it on my rclone machines and so some experimenting.

Otherwise your BIND solution seems like like a reasonable one if you really don’t want IPv6 any more! As there are very very few IPv6 only services (can’t think of any except for test things) I don’t see a downside.

Don’t get me wrong, I have nothing against IPv6. But the fact is that the local ISP here don’t have it (yet!) so it’s no use keeping it configured and dealing with the resulting issues… when the ISP gets a clue and starts offering it, or better yet, I get a less-sucking ISP, I will certainly reconfigure my BIND named server to let IPv6 records through again…

Thanks for helping me again, Nick. I will keep this topic posted.

Cheers, – Durval.


#16

OK, first report; I just created the following /etc/gai.conf on my current test machine (running EL6 Linux):

#/etc/gai.conf getaddrinfo() precedence configuration file.
#2018/05/14 Configured to always put IPv4 addresses first (Durval Menezes)

# ::ffff:0:0/96 is the IPv4 address space.

label  ::1/128       0
label  ::/0          1
label  2002::/16     2
label ::/96          3
label ::ffff:0:0/96  4

precedence  ::1/128       50
precedence  ::/0          40
precedence  2002::/16     30
precedence ::/96          20
#precedence ::ffff:0:0/96  10
precedence ::ffff:0:0/96  100

#eof /etc/gai.conf

I’m starting a test with rclone cryptcheck on a large dirtree (a couple of TB on a few hundreds of thousands of files) with a -v -v log right away. If no IPv6 trouble shows on it, then I guess I’m happy :slight_smile: But it certainly does not look promising:

sudo strings /usr/local/bin/rclone | grep resolv.conf
     [...] /etc/resolv.conf [...] 
sudo strings /usr/local/bin/rclone | grep gai.conf
     [nothing is shown] 

Again, will keep the topic posted.

Cheers, – Durval,


#17

Looking at the go docs I see:

When using TCP, and the host resolves to multiple IP addresses, Dial will try each IP address in order until one succeeds.

So I guess the go runtime tries the IPv6 addresses if they happen to come first and sometimes on your machine for reasons unknown they don’t fail immediately. That might be a clue as to what is going on.

Yes, I did the same thing in the go runtime source with the same result!

I expect you’ve seen go’s docs on its name resolvers. The standard builds are build without cgo and so don’t have the cgo resolver available.


#18

Hi @ncw,

Just coming back to report: as expected, fiddling with /etc/gai.conf did not succeed. But filtering the AAAA records out on my central named server fixed it! So I’m a happy rcloner again.

I expect you’ve seen go’s docs on its name resolvers. The standard builds are build without cgo and so don’t have the cgo resolver available.

<rant> I understand and find laudable Go’s idea of having a built-in DNS resolver instead of using the one in the system library (so the executables can be static and fully self-contained, with no dependencies on external shared libraries, and also for the thread efficiency reasons the Go docs explain). But it would really help if that built-in resolver had externally accessible “knobs” (eg, environment variables we could set, /etc or ~/. config files to edit, &c so we sysadmins could tune up its behavior without having to compile and spread special versions around… </rant>

But that’s outside your (and rclone’s) scope, Nick. Again, many thanks for helping me, and for the great piece of software that is rclone.

Cheers,
– Durval.


#19

Great!

Yes, more knobs would be nice… DNS resolution is one of those things that seems simple, but there is an awful lot of tweaks and workarounds in the standard resolver!

Anyway, glad you’ve got it working!