1 / 28

Consensus Routing

Consensus Routing. Antonio-Gabriel Sturzu, SCPD. Table of Contents. Introduction Consistency issues Consensus Routing Overview Stable Mode Transient Mode Performance and Overhead. Introduction. Internet routing, especially interdomain routing has favored responsiveness over consistency

rufin
Download Presentation

Consensus Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consensus Routing Antonio-Gabriel Sturzu, SCPD

  2. Table of Contents • Introduction • Consistency issues • Consensus Routing Overview • Stable Mode • Transient Mode • Performance and Overhead

  3. Introduction • Internet routing, especially interdomain routing has favored responsiveness over consistency • In interdomain routing a router applies a received update immediately to its forwarding table before propagating it to other routers • BGP updates are known to cause up to 30% packet loss for two minutes or more after a routing change • Transient loops account for 90% of all packet loss

  4. Introduction(2) • The primary contribution of the article is that is that it separates the safety concept from the liveness concept and associates consistency with safety and responsiveness with liveness • Consistency safety means that a router forwards a packet along a packet adopted by the upstream routers • Liveness means that the system reacts quickly to failures or policy changes • Separating safety and liveness improves end-to-end availability • They are obtained through stable and transient modes

  5. Consistency Issues • BGP link failures

  6. Consistency issues(2) • BGP policy change

  7. Consistency issues(3) • iBGP link recovery • Such blackholes can cause packet loss for tens of seconds

  8. Consistency issues(4) • BGP policy cycles

  9. Consensus Routing Overview • Forwards packets using • Stable mode • Transient mode • Consensus routers simply log the new routes computed by the policy engine • Periodically all routers engage in a distributed coordination algorithm that determines the most recent set of complete updates

  10. Consensus Routing Overview(2) • The coordination is based on classical distributed snapshot and consensus algorithms • The routers use the output of the coordination to compute a set of stable forwarding tables (STFs) that are guaranteed to be consistent

  11. Stable Mode • The distributed coordination algorithm proceeds in epochs • Steps of an epoch k: • Update log • Distributed snapshot • The snapshot is a globally consistent view of all the updates in the system (complete or incomplete) • Frontier computation • Aggregation • Consensus • Flood

  12. Stable Mode(2) • SFT computation • View change • Versioning • Garbage colection

  13. Router State • Routing Information Base (RIB) • Stores for each destination • Route update received from each neighbor • Locally selected best route • Route advertised to each neighbor • History • Stores for each destination a chronological list of received and selected routes in the RIB • SFTs • Store for each destination the next-hop interfaces corresponding to the stable routes

  14. Router State(2) • Triggers • Globally unique identifier for a set of causally related events propagating through the network • (AS number, trigger number) • In consensus routing each update carries a trigger that is associated with the route being implicitly withdrawn and replaced by the route announced in the update • It tracks when the implicit withdrawal is complete

  15. Router State(3) • In order to maintain the safety property an AS A generates a new trigger to be sent along with an update upon • A failure of the next-hop in A’s current route to the destination • A policy change that causes A to prefer another route to the destination over the current one • Receiving a route from a neighbor B that it prefers over its current route via a different neighbor C

  16. Update Processing

  17. Distributed Snapshot

  18. Frontier Computation • Aggregation • Send the set of triggers (complete or incomplete) • Consensus • Consolidators ensure that • There is no single point of failure • No single AS is trusted with the task of consolidating the snapshot • A consolidator is reachable from every AS with high probability • When consensus ends the consolidators use the snapshot report in order to compute the set of incomplete triggers I in the network

  19. Frontier Computation(2) • In order to compute the set I they use the following idea: • A trigger is said to depend on all trigers that precede it in the history table • A trigger t is said to be complete if neither t nor any of his predecessors are incomplete • Flood • The set of incomplete triggers I and the set S of AS-es that succesfully participated in the distributed snapshot are sent to all AS-es

  20. Building SFTs

  21. Transient Mode • Routing deflections • Backtracking • Detour routing • Backup routes • Use RBGP • Choosing the most link-disjoint backup route from the primary route protects against single link failures

  22. Performance • Link failures • For BGP 13% of failures cause at least half of all AS-es to experience routing loops • For Consensus Routing with transient forwarding • Backtraking enables continuous connectivity for at least 74% of all AS-es following 99% of failure cases • By detouring connectivity is 98.5% • With backup routes connectivity is 98%

  23. Performance • Policy change • For BGP in more than 55% of the test cases AS-es were disconnected from the destination due to transient loops formed during convergence • Consensus routing transitions from one set of consistent loop-free routes to another completely avoiding transient loops

  24. Overhead • Volume of control traffic

  25. Overhead(2) • Cost of consensus • For 9 nodes all the nodes learnt the agreed value in under 450 miliseconds • For 18 and 27 nodes times were 1.4 and 1.8 seconds • Path dilation • Measures how far packets have to be redirected

  26. Overhead(3) • Path dilation

  27. Overhead(4) • Response time • A 30 second epoch results in more than 90% of the paths being adopted in less than 2 minutes

  28. Overhead(5) • Implementation Overhead • Consensus Routing adds 8% in update processing and about 11% additional lines of code to the BGP implementation

More Related