330 likes | 436 Views
Path Splicing. Nick Feamster, Murtaza Motiwala, Megan Elmore, Santosh Vempala. Idea: Backup/Multipath. For intradomain routing IP and MPLS fast re-route Packet deflections [Yang 2006] ECMP, NotVia, Loop-Free Alternates [Cisco] For interdomain routing MIRO [Rexford 2006] Problem
E N D
Path Splicing Nick Feamster, Murtaza Motiwala, Megan Elmore, Santosh Vempala
Idea: Backup/Multipath • For intradomain routing • IP and MPLS fast re-route • Packet deflections [Yang 2006] • ECMP, NotVia, Loop-Free Alternates [Cisco] • For interdomain routing • MIRO [Rexford 2006] • Problem • Scale:Protecting against arbitrary failures requires storing lots of state, exchanging lots of messages • Control:End systems can’t signal when they think a path has “failed”
Backup Paths: Promise and Problems • Bad: If any link fails on both paths, s is disconnected from t • Want:End systems remain connected unless the underlying graph has a cut s t
t Path Splicing: Main Idea Compute multiple forwarding trees per destination.Allow packets to switch slices midstream. • Step 1 (Generate slices): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration • Step 2 (Splice end-to-end paths): Allow traffic to switch between instances at any node in the protocol s
Outline • Path Splicing for Intradomain Routing • Generating slices • Constructing paths • Forwarding • Recovery • Evaluation • Reliability and recovery • Stretch • Effects on traffic • Path Splicing for Interdomain Routing • Ongoing: Prototype and Deployment Paths
Perturbed Graph 1.5 4 1.5 5 s t 1.25 3.5 Generating Slices • Goal: Each instance provides different paths • Mechanism: Each edge is given a weight that is a slightly perturbed version of the original weight • Two schemes: Uniform and degree-based “Base” Graph 3 3 s t 3
How to Perturb the Link Weights? • Uniform: Perturbation is a function of the initial weight of the link • Degree-based:Perturbation is a linear function of the degrees of the incident nodes • Intuition: Deflect traffic away from nodes where traffic might tend to pass through by default
a s t b dst next-hop c t a Slice 1 t c Slice 2 Constructing Paths • Goal: Allow multiple instances to co-exist • Mechanism: Virtual forwarding tables
Forwarding Traffic • One approach: shim header with forwarding bits • Routers use lg(k) bits to index forwarding tables • Shift bits after inspection • To access different (or multiple) paths, end systems simply change the forwarding bits • Incremental deployment is trivial • Persistent loops cannot occur • Other variations are possible
Alternate Approach • Use fewer bits per packet • Each router along the path uses the same set of random bits as an input to select the next hop • Advantages • Less per-packet overhead • Disadvantages • Less direct control over path • No explicit prevention of loops
Recovery Mechanisms • End-system recovery • Switch slices at every hop with probability 0.5 • Network-based recovery • Router switches to a random slice if next hop is unreachable • Continue for a fixed number of hops until destination is reached 12
Availability Evaluation: Two Aspects • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • If two nodes s and t remain connected in the underlying graph, there is some sequence of hops in the routing tables that will result in traffic • Recovery:In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path
Availability Evaluation • A definition for reliability • Does path splicing improve reliability? • How close can splicing get to the best possible reliability (i.e., that of the underlying graph)? • Can path splicing enable fast recovery? • Can end systems (or intermediate nodes) find alternate paths fast enough?
Reliability Definition • Reliability:the probability that, upon failing each edge with probability p, the graph remains connected • Reliability curve:the fraction of source-destination pairs that remain connected for various link failure probabilities p • The underlying graph has an underlying reliability (and reliability curve) • Goal: Reliability of routing system should approach that of the underlying graph.
Reliability Curve: Illustration Fraction of source-dest pairs disconnected Better reliability Probability of link failure (p) More edges available to end systems -> Better reliability
Experimental Setup • Evaluation on two topologies • GEANT (Real) and Sprint (Rocketfuel) • Compute base graph by taking the union of k perturbed graphs • Remove an edge from the base graph with probability p • Compute number of pairs that could reach one another (average over 1,000 trials)
Reliability Approaches Optimal • Sprint (Rocketfuel) topology • 1,000 trials • p indicates probability edge was removed from base graph Reliability approaches optimal Average stretch is only 1.3 Sprint topology,degree-based perturbations
Simple Recovery Strategies Work Well • Which paths can be recovered within 5 trials? • Sequential trials: 5 round-trip times • …but trials could also be made in parallel Recovery approaches maximum possible Adding a few more slices improves recovery beyond best possible reliability with fewer slices.
Significant Novelty for Modest Stretch • Novelty: difference in nodes in a perturbed shortest path from the original shortest path Fraction of edges on short path shared with long path Example s d Novelty: 1 – (1/3) = 2/3
Evaluation Summary: Splicing Can Improve Availability • Reliability: Connectivity in the routing tables should approach the that of the underlying graph • Approach: Overlay trees generated using random link-weight perturbations. Allow traffic to switch between them • Result: Splicing ~ 10 trees achieves near-optimal reliability • Recovery:In case of failure, nodes should quickly be able to discover a new path • Approach: End nodes randomly select new bits • Result: Recovery within 5 trials approaches best possible.
Does Splicing Create Loops? • Persistent loops are avoidable • In the simple scheme, path bits are exhausted from the header • Never switching back to the same • Transient loops can still be a problem because they increase end-to-end delay (“stretch”) • Longer end-to-end paths • Wasted capacity • Two-hop loops do occur (around 1 in 100 trials for k=2, more for higher values of k), but can be avoided with the mechanisms above
Interactions with Traffic Maximum utilization unaffected
Required new functionality • Storing multiple entries per prefix • Indexing into them based on packet headers • Selecting the “best” k routes for each destination Path Splicing for Interdomain Routing • Observation: Many routers already learn multiple alternate routes to each destination. • Idea: Use the bits to index into these alternate routes at an AS’s ingress and egress routers. default d alternate Splice paths at ingress and egress routers
Interdomain Splicing Header • Intradomain bits function as before • Interdomain: Three sections • Ingress and egress • Policy: restrict “illegal” entries in the forwarding table
Experimental Setup • 2,500-node policy-annotated AS graph • Use C-BGP to compute routes on base graph • Remove each inter-AS edge with probability p • Test connectivity between a random subset of AS pairs • Compute base reliability without policy restrictions
Interdomain Splicing: Reliability 2-slice deployment approaches best possible
Incremental Deployment Partial deployment provides some gains
Effects on Traffic Sprint Abilene • Data: Abilene traces and synthetic Sprint traffic • Observations: • No adverse effects on traffic • Slightly balances traffic in the network • Due to longer paths, the overall utilization increases but is not significant ( ~ 4% for Abilene) 30
Effects on Traffic Abilene Topology 31
Ongoing Work • Software implementation • Click Element • PlanetLab/VINI deployment • Open questions • What API should the network layer provide? • How to perform monitoring/failure detection? • Extension to Cisco Multi-Topology Routing • In-progress IETF draft
Open Questions and Ongoing Work • How does splicing interact with traffic engineering? Sources controlling traffic? • What are the best mechanisms for generating slices and recovering paths? • Can splicing eliminate dynamic routing?
Conclusion • Simple: Forwarding bits provide access to different paths through the network • Scalable: Exponential increase in available paths, linear increase in state • Stable: Fast recovery does not require fast routing protocols http://www.cc.gatech.edu/~feamster/papers/splicing.pdf