How Solid Is Your DNS Architecture?

Over the weekend it was pointed out on the NANOG mailing list that DNS names under the youtube.com zone were having trouble resolving, and that some ISPs’ customers where complaining. While no detailed information was made available indicating precisely what the problem actually was, some DNS query output posted to the list indicates the two authoritative servers for the youtube.com domain were at least intermittently unreachable.

The authoritative servers for the youtube.com domain, when querying a TLD root name server with the dig tool, are as provided:

danny@pork% dig +short ns youtube.com @a.gtld-servers.net
dns1.sjl.youtube.com.
dns2.sjl.youtube.com.

Normally, from here in the name resolution path, these authoritative servers would be queried for the youtube.com record of interest. If both these servers are unavailable, then all of youtube.com would effectively be offline. But what’s the odds of both of these servers being offline or unavailable at the same time? Well, let’s have a gander…

As you can see from the verbose dig output below, when querying a.gtld-servers.net for the NS (name server) records associated with youtube.com, the following is returned:

danny@pork% dig ns youtube.com @a.gtld-servers.net

; <<>> DiG 9.4.1-P1 <<>> ns youtube.com @a.gtld-servers.net
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21301
;; flags: qr rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;youtube.com. IN NS

;; ANSWER SECTION:
youtube.com. 172800 IN NS dns1.sjl.youtube.com.
youtube.com. 172800 IN NS dns2.sjl.youtube.com.

;; ADDITIONAL SECTION:
dns1.sjl.youtube.com. 172800 IN A 208.65.152.201
dns2.sjl.youtube.com. 172800 IN A 208.65.152.137

;; Query time: 106 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Mon May 5 12:29:00 2008
;; MSG SIZE rcvd: 10

The most relevant bits here are the NS records, pointing to dns1.sjl.youtube.com, and dns2.sjl.youtube.com, and the A (address) records associated with those names (208.65.152.201 & 208.65.152.137). The first thing to point out is that the two authoritative servers appear to be out of the same /24 address prefix, and coincidentally, one adjacent to the 208.65.153.0/24 prefix that was hijacked just a couple months back. Furthermore, all the primary contact Internet addresses for www.youtube.com, as well as both authoritative name servers for the zone, are covered by a single BGP route announcement for 208.65.152.0/22 originated by AS 36561 (YouTube Inc.):

route-views.oregon-ix.net>sh ip bgp 208.65.152.0
BGP routing table entry for 208.65.152.0/22, version 295866
Paths: (38 available, best #38, table Default-IP-Routing-Table)
Not advertised to any peer
2905 701 3356 36561
196.7.106.245 from 196.7.106.245 (196.7.106.245)
Origin IGP, metric 0, localpref 100, valid, external

route-views.oregon-ix.net>sh ip bgp 208.65.153.0
BGP routing table entry for 208.65.152.0/22, version 295866
Paths: (38 available, best #38, table Default-IP-Routing-Table)
Not advertised to any peer
2905 701 3356 36561
196.7.106.245 from 196.7.106.245 (196.7.106.245)
Origin IGP, metric 0, localpref 100, valid, external

As RFC 2182 – Selection and Operation of Secondary DNS Servers recommends:

A number of problems in DNS operations today are attributable to poor choices of secondary servers for DNS zones. The geographic placement as well as the diversity of network connectivity exhibited by the set of DNS servers for a zone can increase the reliability of that zone as well as improve overall network performance and access characteristics. Other considerations in server choice can unexpectedly lower reliability or impose extra demands on the network.

While these authoritative servers are within the same /24 prefix, they may not reside on the same physical LAN within the YouTube network, and may even be anycasted within that domain. However, with no externally-reachable (i.e., outside of AS 36561 space) authoritative server, and the close proximity in addresses from within the same /24, it does increase the probability of errors resulting in both authoritative servers being unavailable.

Interestingly, if you can reach one of the two authoritative name servers for the zone in order to access the actual youtube.com zone file, they list two additional authoritative servers:

danny@pork% dig ns @dns1.sjl.youtube.com youtube.com soa
;; Warning, extra type option

; <<>> DiG 9.4.1-P1 <<>> ns @dns1.sjl.youtube.com youtube.com soa
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32282
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;youtube.com. IN SOA

;; ANSWER SECTION:
youtube.com. 3600 IN SOA sjl-ins1.sjl.youtube.com. dns-admin.youtube.com. 2008050501 10800 3600 604800 86400

;; AUTHORITY SECTION:
youtube.com. 3600 IN NS dns2.sjl.youtube.com.
youtube.com. 3600 IN NS dns3.sjl.youtube.com.
youtube.com. 3600 IN NS sjl-ins2.sjl.youtube.com.
youtube.com. 3600 IN NS dns1.sjl.youtube.com.

;; ADDITIONAL SECTION:
dns1.sjl.youtube.com. 3600 IN A 208.65.152.201
dns2.sjl.youtube.com. 3600 IN A 208.65.152.137
dns3.sjl.youtube.com. 3600 IN A 64.15.123.241
sjl-ins2.sjl.youtube.com. 3600 IN A 208.65.153.140

;; Query time: 82 msec
;; SERVER: 208.65.152.201#53(208.65.152.201)
;; WHEN: Mon May 5 13:11:50 2008
;; MSG SIZE rcvd: 232

Of the additional servers, dns3.sjl.youtube.com, is out of a different block of addresses, 64.15.123.241, and adds some resiliency in the event that dns1 & dns2 become unavailable – IF the NS records for the zone have recently been cached by the users DNS resolver. Let’s see if it’s reachable and serving the zone:

danny@pork% dig ns @dns3.sjl.youtube.com youtube.com soa +short
;; Warning, extra type option
sjl-ins1.sjl.youtube.com. dns-admin.youtube.com. 2008050501 10800 3600 604800 8640

Yep, appears to be fine. The sjl-ins2.sjl.youtube.com server is addressed from the /24 IP address block adjacent to the other two dns*.sjl servers, and covered by the /22 BGP announcement displayed above. There’s an additional server, sjl-ins1.youtube.com that is listed in the Primary Master (dynamic DNS name-server) field of the SOA record. Interestingly, when I was looking at this on Saturday, the NS record and SOA entry for the sjl-ins1.sjl.youtube.com server were present, but had no corresponding A record. This is normally NOT a problem, as DNS resolvers are usually smart enough to sort these things out. However, the NS and A records for sjl-ins2.sjl.youtube.com is a recent edition to the zone file, and appears to have been added today given the recent serial number (2008050501) on the current copy of the SOA. It, unfortunately, is NOT reachable at the moment:

danny@pork% dig @208.65.153.140 youtube.com ns

; <<>> DiG 9.4.1-P1 <<>> @208.65.153.140 youtube.com ns
; (1 server found)
;; global options: printcmd
;; connection timed out; no servers could be reached
danny@pork% dig @208.65.153.140 youtube.com ns

; <<>> DiG 9.4.1-P1 <<>> @208.65.153.140 youtube.com ns
; (1 server found)
;; global options: printcmd
;; connection timed out; no servers could be reached

Apparently, they added an A record corresponding with the sjl-ins2.sjl.youtube.com’s NS record, yet they failed to verify if it was externally reachable AND serving the zone correctly – which it was not, ugh! So, now, in the past few minutes, they’ve removed the NS & A record for the sjl-ins2.sjl.youtube.com server and published yet another version of the zone file (2008050502):

danny@pork% dig ns @dns1.sjl.youtube.com youtube.com soa
;; Warning, extra type option

; <<>> DiG 9.4.1-P1 <<>> ns @dns1.sjl.youtube.com youtube.com soa
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27093
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;youtube.com. IN SOA

;; ANSWER SECTION:
youtube.com. 3600 IN SOA sjl-ins1.sjl.youtube.com. dns-admin.youtube.com. 2008050502 10800 3600 604800 86400

;; AUTHORITY SECTION:
youtube.com. 3600 IN NS dns2.sjl.youtube.com.
youtube.com. 3600 IN NS dns3.sjl.youtube.com.
youtube.com. 3600 IN NS dns1.sjl.youtube.com.

;; ADDITIONAL SECTION:
dns1.sjl.youtube.com. 3600 IN A 208.65.152.201
dns2.sjl.youtube.com. 3600 IN A 208.65.152.137
dns3.sjl.youtube.com. 3600 IN A 64.15.123.241

;; Query time: 84 msec
;; SERVER: 208.65.152.201#53(208.65.152.201)
;; WHEN: Mon May 5 13:50:11 2008
;; MSG SIZE rcvd: 193

Now, this looks reasonable to me. If you’re wondering what the impact of the mistake above was, which has come and gone since I began writing this blog post, it did likely cause some DNS resolution and sporadic YouTube reachability problems for folks that didn’t have the previously cached version of the zone file, or that attempted to pick up a new one and employed the sjl-ins2.youtube.com NS record for youtube.com name resolution purposes. However, that piece of the problem seems to have been corrected.

So, where’s that leave us now? Well, because of the absence in the root of the dns3.sjl.youtube.com NS entry, it provides little value if the two authoritative servers (with IP addresses in close proximity) are unavailable. Getting it added will provide some additional resiliency for Youtube. I recall many moons ago when Hotmail fell victim to similar lack of resiliency with regards to DNS name servers.

Some of the recommendations provided in RFC 2182 relate not just to improved name resolution resiliency (redundant, topologically diverse servers, etc..), but better load distribution and query response times as well. In addition, simple things like ensuring that IF you add an A record and NS record to a zone, make sure the host listed is serving the zone, and is reachable both internally AND externally. Again, many things that on the surface may seem intuitive but are oft-overlooked are provided in RFC 2182.

Additionally, there are many companies that focus on DNS services (e.g., Neustar’s UltraDNS Services), some ISPs offer authoritative services as part of their connectivity portfolio (e.g., Verizon Business), and many of the registrars and other folks also provide authoritative DNS hosting services.  Although I would remind that some caution should be exercised when selecting DNS services providers, as outlined here.

The reason I believe this is important is because it’s something that transparent to most folks, and while it’s not nearly as shiny or sexy as, say, Pakistan hijacking Youtube’s address space, the effect is the same.

One Response to “How Solid Is Your DNS Architecture?”

August 18, 2008 at 4:04 pm, domain name info said:

domain name info…

HostedToday. com, an established web hosting company in Charlotte, North Carolina, strives to continue exceeding customer expectations by continuing to add more features….

Comments are closed.