280 likes | 365 Views
Impact of Configuration Errors on DNS Robustness. V. Pappas * Z. Xu * , S. Lu * , D. Massey ** , A. Terzis *** , L. Zhang * * UCLA, ** Colorado State, *** John Hopkins. are they the same?. Motivation. DNS: part of the Internet core infrastructure
E N D
Impact of Configuration Errors on DNS Robustness V. Pappas * Z. Xu *, S. Lu *, D. Massey **, A. Terzis ***, L. Zhang * * UCLA, ** Colorado State, *** John Hopkins
are they the same? Motivation • DNS: part of the Internet core infrastructure • Applications: web, e-mail, e164, CDNs … • DNS: considered as a very reliable system • Works almost always • Question: is DNS a robust system? • User-perceived robustness • System robustness
Motivation Short Answer: “Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet --because of an equipment misconfiguration” -- Wired News, Jan 2001 • Thousands or even millions of users affected • All due to a singleDNS configuration error
Related Work • Traffic & implementation errors studies: • Danzig et al. [SIGCOMM92]: bugs • CAIDA : traffic & bugs • Performance studies: • Jung et al. [IMW01]: caching • Cohen et al. [SAINT01]: proactive caching • Liston et al. [IMW02]: diversity • Server availability : • To appear [OSDI04, IMC04]
Our Work: Study DNS Robustness • Classify DNS operational errors: • Study known errors • Identify new types of errors • Measure their pervasiveness • Quantify their impact on DNS • availability • performance
Outline • DNS Overview • Measurement Methodology • DNS Configuration Errors • Example Cases • Measurement Results • Discussion & Summary
jp net uk ca bar.foo.com. NS ns1.bar.foo.com. bar.foo.com. NS ns3.bar.foo.com. bar.foo.com. NS ns2.bar.foo.com. bar.foo.com. MX mail.bar.foo.com. www.bar.foo.com. A 10.10.10.10 Zone: Occupies a continues subspace Served by the same nameservers bar resource records name servers Background com foo buz bar bar1 bar2 bar3
answer: www.bar.foo.com A 10.10.10.10 caching server referral: com NS RRs com A RRs root zone referral: foo NS RRs foo A RRs referral: bar NS RRs bar A RRs com zone foo zone bar zone asking for www.bar.foo.com client
foo.com. NS ns1.foo.com. foo.com. NS ns2.foo.com. foo.com. NS ns3.foo.com. ns1.foo.com. A 1.1.1.1 ns2.foo.com. A 2.2.2.2 ns3.foo.com. A 3.3.3.3 foo.com. NS ns1.foo.com. foo.com. NS ns2.foo.com. foo.com. NS ns3.foo.com. ns1.foo.com. A 1.1.1.1 ns2.foo.com. A 2.2.2.2 ns3.foo.com. A 3.3.3.3 Infrastructure RRs • NS Resource Record: • Provides the names of a zone’s authoritative servers • Stored both at the parent and at the child zone com • A Resource Record • Associated with a NS resource record • Stored at the parent zone (glue A record) foo.com
focus of our work What Affects DNS Availability • Name Servers: • Software failures • Network failures • Scheduled maintenance tasks • Infrastructure Resource Records: • Availability of these records • Configuration errors
Lame Delegation Delegation Inconsistency Diminished Redundancy Cyclic Dependency The configuration of infrastructure RRs does not correspond to the actual authoritative name-servers. More than one name-servers share a common point of failure. Classification of Measured Errors Inconsistency Dependency
What is Measured? • Frequency of configuration errors: • System parameters: TLDs , DNS level, zone size (i.e. the number of delegations) • Impact on availability: • Number of servers: lost due to these errors • Zone’s availability: probability of resolving a name • Impact on performance: • Total time to resolve a query • Starting from the query issuing time • Finishing at the query final answer time
Measurement Methodology • Error frequency and availability impact: • 3 sets of active measurements • Random set of 50K zones • 20K zones that allow zone transfers • 500 popular zones • Performance impact: • 2 sets of passive measurements:1-week DNS packet traces
Lame Delegation foo.com. NS A.foo.com. foo.com. NS B.foo.com. A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2 com 1) Non-existing server -- 3 seconds perf. penalty foo 2) DNS error code -- 1 RTT perf. penalty 3) Useless referral -- 1 RTT perf. penalty 4) Non-authoritative answer (cached) A.foo.com B.foo.com
50% 0.06 sec 3 sec 0.4 sec Lame Delegation Results
Lame Delegation Results • Error Frequency: • 15% of the zones • 8% for the 500 most popular zones • independent of the zone’s size, varies a lot per TLD • Impact: • 70% of the zones with errors lose half or more of the authoritative servers • 8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation
Diminished Server Redundancy foo.com. NS A.foo.com. foo.com. NS B.foo.com. A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2 com A) Network level: - belong to the same subnet foo B) Autonomous system level: - belong to the same AS C) Geographic location level: - belong to the same city A.foo.com B.foo.com
Diminished Server Redundancy Results • Error Frequency: • 45% of all zones have all servers in the same /24 subnet • 75% of all zones have servers in the same AS • large & popular zones: better AS and geo diversity • Impact: • less than 99.9% availability: all servers in the same /24 subnet • more than 99.99% availability: 3 servers at different ASs or different cities
B.foo.com. A 2.2.2.2 The A glue RR for B.foo.com missing B.foo.com depends on A.foo.com If A.foo.com is unavailable then B.foo.com is too Cyclic Zone Dependency (1) foo.com. NS A.foo.com. foo.com. NS B.foo.com. A.foo.com. A 1.1.1.1 com foo A.foo.com B.foo.com
bar.com. NS A.bar.com. bar.com. NS B.foo.com. A.bar.com. A 2.2.2.2 If A.foo and A.bar are unavailable, B addr. are unresolvable The B servers depend on A servers bar The foo.com zone seems correctly configured B.foo.com A.bar.com Cyclic Zone Dependency (2) foo.com. NS A.foo.com. foo.com. NS B.bar.com. A.foo.com. A 1.1.1.1 com The combination of foo.com and bar.com zones is wrongly configured foo B.bar.com A.foo.com
Cyclic Zone Dependency Results • Error Frequency: • 2% of the zones • None of the 500 most popular zones • Impact: • 90% of the zones with cyclic dependency errors lose 25% (or even more) of their servers • 2 or 4 zones are involved in most errors
Discussion: User-Perceived != System Robustness • User-perceived robustness: • Data replication: only one server is needed • Data caching: temporarymasks infrastructure failures • Popular zones: fewer configuration errors • System robustness: • Fewer available servers: due to inconsistency errors • Fewer redundant servers: due to dependency errors
Discussion: Why so many errors? • Superficially: are due to operators: • Unaware of these errors • Lack of coordination • parent-child zone, secondary servers hosting • Fundamentally: are due to protocol design: • Lack of mechanisms to handle these errors • proactively or reactively • Design choices that embrace some of them: • Name-servers are recognized with names • Glue NS & A records necessary to set up the DNS tree
Summary • DNS operational errors are widespread • DNS operational errors affect availability: • 50% of the servers lost • less than 99.9% availability • DNS operational errors affect performance: • 1 or even 2 orders of magnitude • DNS system robustness lower than user perception • Due to protocol design, not just due to operator errors
Ongoing Work • Reactive mechanisms: • DNS Troubleshooting [NetTs 04] • Proactive mechanisms: • Enhancing DNS replication & caching