1 / 30

IRTF-RR

IRTF-RR. ahuja@umich.edu. IRTF agenda. Agenda issues (5 sec) Intro - why are we here (10 sec) - abha Goals of the group, etc (30 min)- sean Topics of Interest Convergence (10 minutes) - abha Nimrod (20 minutes) - noel Questions and Answers/Feedback. IRTF RR intro. Who are we?

mika
Download Presentation

IRTF-RR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IRTF-RR ahuja@umich.edu

  2. IRTF agenda • Agenda issues (5 sec) • Intro - why are we here (10 sec) - abha • Goals of the group, etc (30 min)- sean • Topics of Interest • Convergence (10 minutes) - abha • Nimrod (20 minutes) - noel • Questions and Answers/Feedback

  3. IRTF RR intro • Who are we? • ahuja@umich.edu • smd@ebone.net • irtf-rr-chairs@nether.net • Where is the info? • http://www.nether.net/irtf-rr • irtf-rr-request@nether.net

  4. IRTF RR • Why are we here? • Resurrect this working group • Open session to tell folks what we are working on • Get feedback from the public for additional topics to add to our list

  5. IRTF-RR goals • do routing research :) • most of work done in mailing list and small groups

  6. Approaching the issues... • What is going on now? • Routing issues today • What are the problems? • What can we do fix it? • What should we do in the future?

  7. Routing Research • Topics of interest • routing convergence, stability and scalability • fault tolerance • Quality of Service routing • multicast routing • Extremely dynamic contraint-based routing • Traffic engineering • NAT and IPv6 routing • optical networks and routing • operational concerns of routing

  8. Q&A • What issues do you think are important to address? • QoS? • Convergence? • Scalability?

  9. Experimental Measurement of Delayed Convergence Craig Labovitz Microsoft Research/Merit Network, Inc. Abha Ahuja Merit Network, Inc. Farnam Jahanian, Abhijit Bose University of Michigan Slides originally presented at NANOG. IRTF-RR at Pittsburgh IETF email: ahuja@umich.edu

  10. Mostly seems to work Time Mostly seems to work The Internet: Failure Analysis Something happens. Doesn’t work.

  11. Routing Protocol Convergence • Unlike connection oriented PSTN (~30 ms), Internet does not have fail-over. • Instead, each node recalculates on a hop-per-hop basis (i.e. no flooding of changes) • Distance-vector algorithms (e.g. RIP, BGP) exhibit slower convergence than link state protocols • During convergence • Latency, loss, out of order • Additional update messages (CPU processing)

  12. Distance Vector (BF) Protocols • Suffer from counting to infinity problem • Solutions • Poison reverse • Split horizon • Path vectors B Example A C

  13. Conventional Wisdom • “Restoral is not an issue in the IP world” • Just reroute around in a few milliseconds or whatever • BGP convergence takes only a few _____ • “Bad news travels fast” • Fast withdraw propagation valid goal • Announcements slower because bundled • BGP has great convergence properties • ASPath solved the convergence and counting to infinity problems • All my customers are multi-homed, triple-homed • Convergence -- what, me worry?

  14. More Conventional Wisdom • Enough bandwidth will solve anything “It will all be one big network one day soon anyways” (Especially after yesterday)

  15. Internet Failures • Replication, round-robin DNS, etc. helps reliability of inter-domain content oriented services • Inter-domain transaction oriented services (e.g. VoIP, EBay, database commits, etc.) still pose a challenge • Important model how long does it take for the Internet to converge • After Failure • After Fail-Over • After Repair

  16. BGP: Bad news • With unconstrained policies (Griffin99, Varadhan96) • Divergence • Possible create mutually unsatisfiable policies • NP-complete to identify these policies in IRR • Happening today? • With constrained policies (e.g. shortest path first) • Transient oscillations • BGP usually converges • It might just take a very long time…. • This talk is about constrained policies

  17. Some Observations • How do we study convergence? • From BGP logs (e.g. debug ip bgp), difficult to determine causal relationships • Earlier work studied BGP pathologies and failures • Still lots of BGP duplicates and oscillations • Failure/repair data (next slide) for default-free routes shows 30 minute curve • Examined long-lived default-free routes from 24 providers for a year • Restoral time for given provider after failure (i.e. route withdrawn)

  18. How long until routes return? (From A Study of Internet Failures) What is happening here?

  19. 16 Month Study of Convergence • Instrument the Internet • Inject routes into geographically and topologically diverse provider BGP peering sessions (Mae-West, Japan, Michigan, London) • Periodically fail and change these routes (i.e. send withdraws or new attributes) • Time events using ICMP echos and NTP synchronized BGP “routeviews” monitoring machines (also http gets) • Write lots of Perl scripts • Wait a sixteen months… (45,000 routing events)

  20. Setup

  21. How Many Announcements Does it Take For an AS to Withdraw a Route? 7/5 19:33:25 Route R is withdrawn 7/5 19:34:15 AS6543 announceR 6543 66665 8918 1 5696 999 7/5 19:35:00 AS6543 announceR 6543 66665 8918 67455 6461 5696 999 7/5 19:35:37 AS6543 announceR 6543 66665 4332 6461 5696 999 7/5 19:35:39 AS6543 announceR 6543 66665 5378 6660 67455 6461 5696 999 7/5 19:35:39 AS6543 announceR 6543 66665 65 6461 5696 999 7/5 19:35:52 AS6543 announceR 6543 66665 6461 5696 999 7/5 19:36:00 AS6543 announceR 6543 66665 5378 6765 6660 67455 6461 5696 999 … 7/5 19:38:22 AS6543 withdrawR Answer: Up to 19 (AS6543 chosen as an example – all AS’es exhibit similar behavior) Abha made me change the AS numbers

  22. Withdraw Convergence After a BGP route is withdrawn, barring other failures, how long does it take Internet routing tables to reach steady-state?

  23. Withdraw Convergence AS1 AS2 AS3 AS4

  24. Withdraw Convergence • Probability distribution • Providers exhibit different, but related convergence behaviors • 80% of withdraws from all ISPs take more than a minute • For ISP4, 20% withdraws took more than three minutes to converge

  25. Fail-Overs and Repairs What are the relative convergence latencies for fail-overs and repairs? Does bad news (withdraws) travel faster?

  26. Failures, Fail-overs and Repairs

  27. Failures, Fail-overs and Repairs • Bad news does not travel fast… • Repairs (Tup) exhibit similar convergence properties as long-short ASPath fail-over • Failures (Tdown) and short-long fail-overs (e.g. primary to secondary path) also similar • Slower than Tup (e.g. a repair) • 60% take longer than two minutes • Fail-over times degrade the greater the degree of multi-homing!

  28. What is Happening? • Non-deterministic ordering of BGP update messages leads to • Transient oscillations • Each change in FIB adds delay (CPU, BGP bundling timer) • At extreme, convergence triggers BGP dampening

  29. ABCD BACD CBAD DBCA ABDC BADC CBDA DBAC ACBD BCAD CABD DCBA ADBC BDAC CDBA DABC ACDB BCDA CADB DCAB ADCB BDCA CDAB DACD A B C D BGP and RIP • RIP precisely monotonically increasing. Can explore metrics (1…N) • BGP monotonically increasing. Multiple (N!) ways to represent a path metric of N. • BGP “solved” RIP routing table loop problem by making it exponentially worse… N=4

  30. Questions? send email to ahuja@umich.edu

More Related