340 likes | 450 Views
Improving Internet Availability with Path Splicing. Murtaza Motiwala Nick Feamster Santosh Vempala. Availability. “It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder.
E N D
Improving Internet Availabilitywith Path Splicing Murtaza MotiwalaNick FeamsterSantosh Vempala
Availability • “It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. • Over time, our list will evolve. It should be: • Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today. • … It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be: 1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.
Availability of Other Services • Carrier Airlines (2002 FAA Fact Book) • 41 accidents, 6.7M departures • 99.9993% availability • 911 Phone service (1993 NRIC report +) • 29 minutes per year per line • 99.994% availability • Std. Phone service (various sources) • 53+ minutes per line per year • 99.99+% availability
Can the Internet Be “Always On”? • Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines” • More “critical” (or at least availability-centric) applications on the Internet • At the same time, the Internet is getting more difficult to debug • Increasing scale, complexity, disconnection, etc. Is it possible to get to “5 nines” of availability?If so, how?
High Availability: Two Aspects • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic • Recovery:In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
Where Today’s Protocols Stand • Reliability: Routing protocols are single path. • When a link or node failure occurs, routers must recompute new paths to each destination • Approach: Compute backup paths • Challenge: Many possible failure scenarios! • Recovery: Today’s Internet routing protocols • Meanwhile, packets are dropped, reordered, etc. • Approach: Switch to a backup when a failure occurs • Challenge: Must quickly discover a new working path
Multipath: Promise and Problems • Bad: If any link fails on both paths, s is disconnected from t • Want:End systems remain connected unless the underlying graph has a cut s t
t Path Splicing: Main Idea Compute multiple forwarding trees per destination.Allow packets to switch slices midstream. • Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration • Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol s
Outline • Path Splicing • Achieving Reliabile Connectivity • Mechanism #1: Random Perturbations • Mechanism #2: Network Slicing • Forwarding • Recovery • Properties • High Reliability • Bounded Stretch • Fast recovery • Open Questions
Perturbed Graph 1.5 4 1.5 5 s t 1.25 3.5 Mechanism #1: Perturbations • Goal: Each instance provides different paths • Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight • Two schemes: Uniform and degree-based “Base” Graph 3 3 s t 3
How to Perturb the Link Weights? • Uniform: Perturbation is a function of the initial weight of the link • Degree-based:Perturbation is a linear function of the degrees of the incident nodes • Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default
a s t b dst next-hop c t a Slice 1 t c Slice 2 Mechanism #2: Network Slicing • Goal: Allow multiple instances to co-exist • Mechanism: Virtual forwarding tables
Forwarding Traffic • Packet has shim header with forwarding bits • Routers use lg(k) bits to index forwarding tables • Shift bits after inspection • To access different (or multiple) paths, end systems simply change the forwarding bits • Incremental deployment is trivial • Persistent loops cannot occur
Putting It Together • End system sets forwarding bits in packet header • Forwarding bits specify slice to be used at any hop • Router: examines/shifts forwarding bits, and forwards s t
A Definition Motivated by Reliability • Reliability:the probability that, upon failing each edge with probability p, the graph remains connected • Reliability curve:the fraction of source-destination pairs that remain connected for various link failure probabilities p • The underlying graph has an underlying reliability (and reliability curve) • Goal: Reliability of routing system should approach that of the underlying graph.
Reliability Curve: Illustration Fraction of source-dest pairs disconnected Better reliability Probability of link failure (p) More edges available to end systems -> Better reliability
Reliability Approaches Optimal • Sprint (Rocketfuel) topology • 1,000 trials • p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology,degree-based perturbations
Recovery is Fast • Which paths can be recovered within 5 trials? • Sequential trials: 5 round-trip times • …but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices.
Stretch is Bounded • Stretch:How much longer is the path taken by packets over the “optimal” path? • Stretch is bounded in one slice by amount of perturbation • …but what about the stretch of spliced paths? • As long as “significant progress” (a large fraction of the distance to d) is achieved for each hop, stretch bounded Implication:Loops are rare.
High Availability with Splicing • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them. • Result: Splicing ~ 10 trees achieves near-optimal reliability • Recovery:In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path • Approach: End nodes randomly select new bits. • Result: Recovery within five trials approaches best possible.
Open Questions and Future Work • How does splicing interact with traffic engineering? • (How) can the bits be best encoded? • What changes are required to today’s routers to make splicing possible? • Can splicing eliminate dynamic routing?
Conclusion • Simple: Forwarding bits provide access to different paths through the network • Scalable: Exponential increase in available paths, linear increase in state • Stable: Fast recovery does not require fast routing protocols • No modifications to existing routing protocols http://www.cc.gatech.edu/~feamster/tmp/splicing-hotnets.pdf
History: Network Embedding • Given: virtual (V) and physical (P) network • Topology, constraints, etc. • Problem: find the appropriate mapping onto available physical resources (nodes and edges) • Idea: Define a virtual graph G’ onto which G can be embedded • A link in G can be mapped to multiple links in G’ • How to forward traffic over multiple links in G’? • …
Possible Applications/Future Work • Fast recovery from poorly performing paths • Data transfer with easy multi-path • Overlay networks, CDNs, etc. • Transfer of video with multiple description • Security applications • Spatial diversity in wireless networks
Significant Novelty for Modest Stretch • Novelty: difference in nodes in a perturbed shortest path from the original shortest path Fraction of edges on short path shared with long path Example s d Novelty: 1 – (1/3) = 2/3
Related Work • Pre-Computed Backup Paths • Multi-Topology Routing • Multiple Router Configuration • MPLS Fast Reroute • End-Node Controlled Traffic • Source routing • Routing deflections • Multipath routing (ECMP, MIRO, etc.) • IGP link-weight optimization • Measurement of path diversity and multihoming • Layer-3 VPNs
Other Properties • Scalable • Exponential increase in paths, linear increase in state • Fast recovery from underlying failures • Automatic tuning (e.g., for traffic engineering) • Perturbations achieve property of automatically spreading traffic across different links • Standard link-weight optimization is potentially brittle in the face of link failures • Incrementally deployable
Control Plane Daemon ForwardingTable Prototype Implementation • Click and Quagga on PL-VINI • http://www.vini-veritas.net/ Control Plane Daemon ForwardingTable Classifier
Required new functionality • Storing multiple entries per prefix • Indexing into them based on packet headers • Selecting the “best” k routes for each destination Variation: BGP Splicing • Observation: Many routers already learn multiple alternate routes to each destination. • Idea: Use the forwarding bits to index into these alternate routes at an AS’s ingress and egress routers. default d alternate Splice paths at ingress and egress routers
Loops, Reconsidered • Problem: Potential for loops between ASes • AS-level loops can be longer than intra-AS loops • Two possible approaches • Detection: routers mark packets and determine that packets have traversed the same AS twice • Prevention: Exploit “common” routing policies to ensure that packets are only deflected along valley-free paths
Preventing Inter-AS Loops with Policy Observation: inter-AS loops inherently involve traversal that violates valley-free Constraints: 1. once a “down” deflection has occurred, do not deflect 2. only allow one “across” deflection Possible relaxation: allow a limited number of violations, specified by source
Definitions of Path Diversity • Connectivity: Minimum number of edges whose failure disconnects the graph (min cut) • Expansion: Intuitively, small cuts disconnect small groups of nodes from the graph
Design Goals • Reachability: allow endpoints to communicate • High Diversity: expose paths to end hosts that survive failures • Capacity: the total available data rate between each source-destination pair should be high • Fault tolerance: the number of disjoint paths should be high, and the network should remain connected under failures • Low Stretch:paths should not be too circuitous • Scalability: scale to a large number of networks, destinations, routers, etc. Today’s routing protocols do not exploit the diversity of the underlying network graph