1 / 30

RON: Resilient Overlay Networks

RON: Resilient Overlay Networks. David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT Laboratory for Computer Science http://nms.lcs.mit.edu/ron/. Network. Fault-tolerant networking. B. A. C. D. Packet switching and route around failures.

tadita
Download Presentation

RON: Resilient Overlay Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT Laboratory for Computer Science http://nms.lcs.mit.edu/ron/

  2. Network Fault-tolerant networking B A C D • Packet switching and route around failures

  3. Internet: network of networks Site 2 Site 3 • ISPs peer to forward packets • ISP exchange route info using BGP ISP1 ISP2 Site 1 ISP3 Site 4 Site 5

  4. The Internet is ill suited to mission-critical applications • Commercial peer architecture • Performance bottlenecks at peering points • Ignores many existing alternate paths • Directly conflicts with robustness • Internet’s global scale: • Prevents sophisticated algorithms • Route selection uses fixed, simple metrics • Routing isn’t sensitive to path quality

  5. How robust is Internet routing?

  6. Our goal To improve communication availability for small groups by at least a factor or 10 • Many applications • Collaboration and conferencing • Virtual Private Networks (VPNs) across public Internet • Overlay Internet Service

  7. Overlay routes around Internet failures MIT Utah Utah Company Cable Modem • Failures: • Outages: Configuration/operational errors, backhoes, etc. • Performance failures: Severe congestion, denial-of-service attacks, etc.

  8. Scalabilityversus recovery • Internet scalability pays a price: • Slow recovery • RON recovers fast by • Limiting size of overlay • Exploiting redundancy in underlying Internet

  9. Redundant links • Multiple paths between all sites MIT Utah Internet 2 Utah Company Cable Modem

  10. Redundant links • But many of them are hidden MIT Utah Utah Company Cable Modem

  11. Resilient overlay networks • Measure all links between nodes • Compute path properties • Determine best route • Forward traffic over that path

  12. Conduit Conduit Forwarder Forwarder Router Prober Router Prober RON design Nodes in different routing domains (ASes) RON library Performance Database Application-specific routing tables Policy routing module

  13. Routing and path selection • Path selection at the entry node • Specialized for routing through one intermediate node • Router computes the forwarding tables • Link-state dissemination through RON • Path evaluation and selection • Latency minimizer: EWMA of round-trip samples • Loss-rate minimizer: average of the last k samples • Throughput optimizer: TCP throughput equation • Select when estimated throughput improves by 2x • 5% hysteresis to avoid flapping

  14. Policy routing • Router computes a forwarding table for each policy • Two ways of describing policies: • Exclusive cliques (e.g., educational only) • General policies • BPF-like packet matcher, which returns a policy • Links that are denied by a policy • Entry node classifies packet with a policy tag

  15. Responding to failure • Probe interval: 12 seconds • Probe timeout: 3 seconds • Routing update interval: 14 seconds

  16. RON overhead • Probe overhead: 69 bytes • RON routing overhead: 60 + 20 (N-1) • 50: allows recovery times between 12 and 25 s

  17. Many research questions • Does the RON approach work at all? • Each RON is small in size, no more than 50 or 100 nodes • How fast can failure detection & recovery happen? • Policy routing • Doesn’t RON violate AUPs and other policies? • Routing behavior • Can stable routing be achieved? • Implementing efficient multi-criteria routing • Is it safe to deploy a large number of (small) interacting RONs on the Internet?

  18. IP forwarder • A RON application • Transparently forwards IP traffic over RON • Allows comparisons of IP traffic over RON versus over direct Internet

  19. RON deployment (19 sites) To vu.nl lulea.se ucl.uk To kaist.kr, .ve .com (ca), .com (ca), dsl (or), cci (ut), aros (ut), utah.edu, .com (tx) cmu (pa), dsl (nc), nyu , cornell, cable (ma), cisco (ma), mit, vu.nl, lulea.se, ucl.uk, kaist.kr, univ-in-venezuela

  20. AS view

  21. Experiments • Measure loss, latency, and throughput with and without RON • RON1: 12 hosts in the US and Europe • 64 hours of measurements in March 2001 • RON2: 16 hosts • 85 hours of measurements in May 2001 • 30-minute average loss rates • A 30 minute outage is very serious! • Note: Experiments done with “No-Internet2-for-commercial-use” policy

  22. Take home messages • RON reduced outages by a factor 5 to 10, and routed around all major outages • RON takes 18s (average) to route around a failure, and can do so in the face of flooding attacks • Single route indirection delivers the majority RON benefits

  23. RON improves loss-rate 30-min average loss rate on Internet RON loss rate never more than 30% 13,000 samples 30-min average loss rate with RON

  24. An order-of-magnitude fewer failures 30-minute average loss rates 6,825 “path hours” represented here 12 “path hours” of essentially complete outage 72 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit!

  25. Why does one hop work? R RON nodes P(good path) = (1 – (1-p)^2)^(R+1) RON In RON testbed: • P(direct path is good) is 48.8% • P(intermediate path is good) is 51% Good (p) Bad (1-p) RON source target ••• RON

  26. Resilience Against DoS Attacks

  27. Latency using RON

  28. What’s next for RON? • Data mining of collected samples • Applications • Routing policies (e.g., rate control)

  29. Other progress: Chord • Chord: a peer-to-peer lookup system • CFS: a peer-to-peer file sharing application www.pdos.lcs.mit.edu/chord

  30. Conclusion • Improved availability of Internet communication paths using small overlays • Layered above scalable IP substrate • RON provides a set of libraries and programs to facilitate this application-specific routing • Experimental data suggest that approach works • Over 10X availability • Outage detection and recovery in about 15 seconds • Able to route around certain denial-of-service attacks • Many interesting questions remain… http://nms.lcs.mit.edu/ron/

More Related