Tyk ignoring IPv4 preference for DNS lookup

I’ve spent way too much time on figuring this one out.

Tyk, or rather golang itself, is probably using wrong underlying function to lookup DNS queries on (at least Debian-based) linux systems. Due to this it seems to completly ignore settings from /etc/gai.conf which would solve issues with delays caused by lookups for non-existing ipv6 (AAAA) entries in ipv4-only networks.
The only way I’ve found to “fix” this is to set resolv.conf timeout to 1 second, which is still a little too much for requests that normally take under 120ms.

See: https://serverfault.com/a/708492

Sample tcpdump of standard ping:

14:17:44.561556 IP (tos 0x0, ttl 64, id 35995, offset 0, flags [DF], proto UDP (17), length 64)
10.24.101.74.36220 > 10.24.220.11.53: [bad udp cksum 0x55c3 → 0xe996!] 50548+ A? staging.local.lan. (36)
14:17:44.561753 IP (tos 0x0, ttl 128, id 18256, offset 0, flags [DF], proto UDP (17), length 80)
10.24.220.11.53 > 10.24.101.74.36220: [udp sum ok] 50548* q: A? staging.local.lan. 1/0/0 staging.local.lan. [1h] A 10.24.101.70 (52)
14:17:44.562208 IP (tos 0x0, ttl 64, id 35996, offset 0, flags [DF], proto UDP (17), length 71)
10.24.101.74.45836 > 10.24.220.11.53: [bad udp cksum 0x55ca → 0x7bbe!] 13596+ PTR? 70.101.24.10.in-addr.arpa. (43)
14:17:44.562370 IP (tos 0x0, ttl 128, id 18257, offset 0, flags [DF], proto UDP (17), length 103)
10.24.220.11.53 > 10.24.101.74.45836: [udp sum ok] 13596* q: PTR? 70.101.24.10.in-addr.arpa. 1/0/0 70.101.24.10.in-addr.arpa. [1h] PTR staging.local.lan. (75)

Sample tcpdump of a request from Tyk:

14:19:34.068973 IP (tos 0x0, ttl 64, id 58519, offset 0, flags [DF], proto UDP (17), length 64)
10.24.101.74.35798 > 10.24.220.11.53: [bad udp cksum 0x55c3 → 0xeba2!] 50419+ AAAA? staging.local.lan. (36)
14:19:34.069196 IP (tos 0x0, ttl 64, id 58520, offset 0, flags [DF], proto UDP (17), length 64)
10.24.101.74.34889 > 10.24.220.11.53: [bad udp cksum 0x55c3 → 0x16be!] 40320+ A? staging.local.lan. (36)
14:19:34.069417 IP (tos 0x0, ttl 128, id 9101, offset 0, flags [none], proto UDP (17), length 80)
10.24.220.11.53 > 10.24.101.74.34889: [udp sum ok] 40320* q: A? staging.local.lan. 1/0/0 staging.local.lan. [1h] A 10.24.101.70 (52)
14:19:35.069113 IP (tos 0x0, ttl 64, id 58576, offset 0, flags [DF], proto UDP (17), length 64)
10.24.101.74.54043 > 10.24.220.11.53: [bad udp cksum 0x55c3 → 0xe49f!] 33969+ AAAA? staging.local.lan. (36)
14:19:35.069337 IP (tos 0x0, ttl 128, id 9424, offset 0, flags [DF], proto UDP (17), length 119)
10.24.220.11.53 > 10.24.101.74.54043: [udp sum ok] 33969* q: AAAA? staging.local.lan. 0/1/0 ns: local.lan. [1h] SOA dc.local.lan. hostmaster.local.lan. 1846122 900 600 86400 3600 (91)

Please note lack of AAAA requests for ping that properly utilizes /etc/gai.conf, in this case:

# For sites which prefer IPv4 connections change the last line to
precedence ::ffff:0:0/96 100

You could try to use the cgo DNS resolver for golang:

###Name Resolution
The method for resolving domain names, whether indirectly with functions like Dial or directly with functions like LookupHost and LookupAddr, varies by operating system.

On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo.

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

The resolver decision can be overridden by setting the netdns value of the GODEBUG environment variable (see package runtime) to go or cgo, as in:

export GODEBUG=netdns=go    # force pure Go resolver
export GODEBUG=netdns=cgo   # force cgo resolver

It’s not perfect but I’ll take it. It still queries ipv6 records, but this resolver seems to have much lower timeout (~200ms).

Many thanks!