180 likes | 276 Views
Protecting the BGP Routes to Top Level DNS Servers. UC Davis Felix Wu. USC/ISI Xiaoliang Zhao Dan Massey Allison Mankin. UCLA Lan Wang Dan Pei Lixia Zhang. AT&T Randy Bush. NANOG-25, June 11, 2002. Routing To Top Level Servers. Root. Critical points of failure near root.
E N D
Protecting the BGP Routes to Top Level DNS Servers UC Davis Felix Wu USC/ISI Xiaoliang Zhao Dan Massey Allison Mankin UCLA Lan Wang Dan Pei Lixia Zhang AT&T Randy Bush NANOG-25, June 11, 2002
Routing To Top Level Servers Root • Critical points of failure near root. • 13 root and 13 gTLD servers • 25 BGP routes (2 share a prefix) • Fault/attack near root could have disproportionately large impact. • 13 bogus routes to deny service. • 1 bogus route to provide bogus DNS. • Scale helps contain risk in lower tree. • Many millions of DNS servers. net org com cairn nanog ietf bogus isi ops NANOG 25 - Toronto
Example DNS Routing Problem • Invalid BGP routes exist in everyone’s table. • These can include routes to root/gTLD servers • One example observed on 4/16/01: originates route to 192.26.92/24 ISPs announce new path 3 lasted 20 minutes 1 lasted 3 hours Internet c.gtld-servers.net 192.26.92.30 rrc00 monitor NANOG 25 - Toronto
A Simple Filter • Current BGP provides dynamic routes • Explore the opposite extreme... • Select a single static route to each server. • Apply AS path filters to block all other announcements. • Also filter against more specifics. • Route changes on a frequency of months, if at all. • Change in IP address, origin AS, or transit policy. • Adjust route only after off-line verification NANOG 25 - Toronto
Why This Works: Theory • Scale is limited to a small number of routes. • No exponential growth in top level DNS servers. • Loss of a server is tolerable, invalid server is not. • Resolvers detect and time-out unreachable servers. • Provided surviving servers handle load, cost is some delay. • Expect predictable properties and stable routes. • Servers don’t change without non-trivial effort. • Servers located in highly available locations. NANOG 25 - Toronto
Why This Works: Data • Analysis based on BGP updates from RIPE. • Archive of BGP updates sent by each peer. • 9 ISPs from US, Europe, and Japan. • February 2001 - April 2002 • Some data collection notes • Used only peers that exchange full routing tables • Otherwise some route changes are hidden by policies • Adjusted data to discount multi-hop effect. • Multi-hop peering session resets don’t reflect ISP ops. NANOG 25 - Toronto
Simple Filter - Impact on Reachability ISP1 (US/Tier 1) NANOG 25 - Toronto
How Static Are The Routes? • 3 changes in route to “A” over 14 months. • 2 (valid) changes in the origin AS • 5/19/01 origin AS changed from 6245 to 11840 • 6/4/01 origin AS changed from 11840 to 19836 • 1 change in transit AS routing policy • 11/8/01 (*,10913, 10913, 10913,*) -> (*,10913, *) • Could have built filter to allow this... NANOG 25 - Toronto
What Routes Are Lost? • Results from 3/1/01 until 5/19/01 AS change. • Reduced reachability to “A” from 99.997% to 99.904% • 18 events when trusted route was withdrawn • 2 resulted in no route available (28 secs, 103 secs) • 8 instances of a back-up route lasting over 3 minutes • Longest lasting back-up advertised for 15 minutes • Similar results for other time periods and servers. NANOG 25 - Toronto
Example of Filtered Routes 1239 10913 * server 19836 701 No route at 16:08:30 • With filter no route at 16:06:32 NANOG 25 - Toronto
Simple Filter - Worst Case In Study ISP 3 (Europe) ISP 3 used one main route and a smallnumber of consistent back-up routes. NANOG 25 - Toronto
Toward a More Balanced Approach • Required infrequent updates to the filter. • Especially useful to automate infrequent tasks. • Natural tendency to forget task or forget how to do task • More paths improves robustness • Simple filtered allowed only 1 path. • ISP3’s reachability can be improved if filterallows two routes… • Strike a balance between allowing dynamic changes and restricting to trusted paths. NANOG 25 - Toronto
Our Adaptive Filter • Slow down the route dynamics and add validation. • Apply hysteresis before accepting new paths • Add options for validating new paths: • Believe route based purely on hysteresis • Probabilistic query/response testing against known data. • Trigger off-line checking (did origin AS really change?) • Algorithm details in upcoming paper http://fniisc.nge.isi.edu NANOG 25 - Toronto
Impacts on Reachability (Adaptive Filter) gTLD servers ISP1 Root servers NANOG 25 - Toronto
Impacts on Reachability (Adaptive Filter) gTLD servers ISP3 Root servers NANOG 25 - Toronto
Conclusions • Routing faults can affect top level DNS servers. • Faults were observed in the current infrastructure. • Potential large scale denial of service. • Solution is to make these routes less dynamic • Relies on unique properties of top level servers. • Lose some robustness to failure • Gain protection against invalid routes. NANOG 25 - Toronto
Discussion • Merit of the problem • Lots of concern over “securing” BGP and DNS • Routes to DNS servers are interesting special case • Do less dynamic routes make sense? • Only applies to this unique scenario • Our data shows trade-off is effective • very interested in access to data for counter example…. NANOG 25 - Toronto
You had to ask…. algorithm detail • Path Usage Uk(p) = Tk(p)/T where T = time period, Tk(p) = time path advertised • Adjust filter at end of time period T • Smooth with exponentially moving weighted average U(p) = (1-a)*U(p) + a*Uk(p) • Allowable routes have Uk(p) > Umin or U(p) > Umin • Validate all new routes and check old routes with Pv • Allow interim addition during T if Tk(p) > Tr • Parameters used in this presentation:T=1 week, Tr=1 hour, Umin=10%, a=0.25, Pv=0.1 NANOG 25 - Toronto