160 likes | 281 Views
Resilient Overlay Networks. CS294-4 Presentation Nikita Borisov Sep 15, 2003. Internet Routing Inefficient. BGP is designed for scalability, sacrificing performance Link outages common, but routing tables take minutes to update Summarized data creates inefficient paths
E N D
Resilient Overlay Networks CS294-4 Presentation Nikita Borisov Sep 15, 2003
Internet Routing Inefficient • BGP is designed for scalability, sacrificing performance • Link outages common, but routing tables take minutes to update • Summarized data creates inefficient paths • No response to congestion
Network Redundancies • Multiple paths exist between most hosts • Many are not advertised due to private peering • Link outages lead to non-transitive reachability • A and C can’t reach each other but B can reach them both • Indirect paths often offer better performance • (though possibly violate AUPs)
RON goals • Fast failure detection and recovery • Seconds, not minutes • Integration with application • Optimize routes for latency, throughput, etc. • Fine-grained policy specification • E.g. keep commercial traffic off Internet2
Overlay Network • Small network - 3-50 nodes • Continuous measurement of each pairwise link • Connectivity/performance stats distributed globally • Pick best path out of direct and indirect ones • Restrict search to one indirect hop
Failure Detection • Active monitoring • Send probes on each virtual link • One probe every 14s • Fast timeout probes if one is lost • Detect failure in under 20s • Faster than any TCP timeout • Good enough for even human scale
Performance Metrics • Estimate latency based on RTT of probes • Moving weighted average • Assume latency is symmetric • Estimate loss rate based on probes received • Average of last 100 samples • Estimate TCP throughput • Model TCP performance based on latency and loss rate
Path Selection • Always route around outages • Application can optimize for latency, loss rate, throughput • Throughput hard to optimize • Avoid bad-throughput routes instead • Exhaustively search all one-hop paths • Introduce hysteresis to prevent “route flapping”
Routing Policy • Policies specify which virtual links to use • Separate routing tables per policy • Packets classified with policy tag and routed accordingly • Sample policy: exclusive clique • Only members of clique can use links between each other • E.g. Internet2 hosts
Measurements • Two studies (RON1 and RON2) • RON recovers from 100% (RON1) or 60% (RON2) outages and high loss rates • Routes around bad throughput failures • Doubles TCP throughput in 5% of all samples • Reduces loss rate by 0.05 in 5% of samples
Performance Problems • RON worse in some cases • Measurement inaccuracies • Information propagation delays • Hysteresis • But … • RON win in most cases • RON loss never very large • RON win, though, can be dramatic
Overhead • Probing traffic - grows O(N) • Routing state traffic - grows O(N2) • Total BW consumed • 2.2Kbps with 10 nodes • 33Kbps with 50 nodes • A limiting factor for scaling
Question • Is this overhead excessive? • Less than 10% of a broadband link • What if RONs become more popular? • Is using a RON “cheating”?
Applications • Videoconferencing • Cooperating ISPs • Branch offices of companies • Others?