170 likes | 184 Views
Dive into BGP oscillation issues and solutions, explore causes and steps, detect oscillation, and implement preventive measures.
E N D
BGP Oscillation …the Internet routing protocol is diverging! Fabien Berger CCIE#6143 IP-Plus Backbone Engineering Fabien Berger, berger@ip-plus..net
Well known issue? • Does a BGP system always converge? • NO! • feature not a bug :) • Researchers have shown theoretical eBGP convergence issues • [Griffin]: • “bad gadget” topology diverges! • backup scenario diverges! • iBGP diverges in complex RR/confederation environment (draft-ietf-idr-route-oscillation-00.txt) Fabien Berger, berger@ip-plus.net
Goal of the presentation • make you aware of the issue (before the customer :) • troubleshooting not easy • pointer to solutions/discussions • presentation based on [NANOG] [Cisco] [IETF] Fabien Berger, berger@ip-plus.net
Convergence • convergence = “process of bringing all route tables to a state of consistency” • no loops! • does not converge -> you see (on a RR or a confed border): • #show ip bgp 10.0.0.0 | include best #Paths: (3 available, best #3) • #show ip bgp 10.0.0.0 | include best #Paths: (3 available, best #2) • #show ip bgp 10.0.0.0 | include best #Paths: (3 available, best #3) • ... Fabien Berger, berger@ip-plus.net
Cause of the Oscillation • RR/confederation hides some information • RR/confederation sends best path only • not all routers know all best paths • MED (Multi Exit Discriminator) vs IGP cost to the neighbor: • A: path 200 100, igp cost 5, med 2 • B: path 300 100, igp cost 50 • C: path 200 100, igp cost 500, med 1 • if A,B,C are known: B is best (assuming “deterministic-med” is enabled [detMED] :) • if C is hidden: A is best • A<B<C<A Fabien Berger, berger@ip-plus.net
Oscillation Step 1 – B selects Y0 – C selects Y1 = Route Reflector = Advertisement = Withdrawal = Client Cluster 1 Cluster 2 1 C B AS_PATH MED IGP 10 3 2 B * Y 0 10 D E A C X 3 Y 1 2 * AS Y MED 0 AS X AS Y MED 1 Fabien Berger, berger@ip-plus.net
Oscillation Step 2 – C selects X = Route Reflector = Advertisement = Withdrawal = Client Cluster 1 Cluster 2 1 C B AS_PATH MED IGP 10 Y 1 3 3 2 B * Y 0 10 D E A C * X 3 Y 1 2 AS Y MED 0 AS X AS Y MED 1 Y 0 11 Fabien Berger, berger@ip-plus.net
Oscillation Step 3 – B selects X = Route Reflector = Advertisement = Withdrawal = Client Cluster 1 Cluster 2 1 C B AS_PATH MED IGP 10 * X 4 3 2 B Y 0 10 D E A C * X 3 Y 1 2 AS Y MED 0 AS X AS Y MED 1 Y 0 11 Fabien Berger, berger@ip-plus.net
Oscillation Step 4 – C selects Y1 = Route Reflector = Advertisement = Withdrawal = Client Cluster 1 Cluster 2 1 C B AS_PATH MED IGP 10 * X 4 3 2 B Y 0 10 D E A C X 3 * Y 1 2 AS Y MED 0 AS X AS Y MED 1 Fabien Berger, berger@ip-plus.net
Oscillation Step 5 – B selects Y0 = Step 1!! = Route Reflector = Advertisement = Withdrawal = Client Cluster 1 Cluster 2 1 C B AS_PATH MED IGP 10 3 2 B * Y 0 10 D E A C X 3 * Y 1 2 AS Y MED 0 AS X AS Y MED 1 Fabien Berger, berger@ip-plus.net
How to detect an oscillation? • Observe the latest received routes: • run every minute during 5 minutes • #show ip route | include ^B_.*_00:00: • prefixes that appear 60% of the time are probably oscillating • full routing table must be traversed :( • Via SNMP: • poll ipRouteAge of ipRouteTable • Observation should be made in the core (top level RR, backbone sub-AS) • eBGP within a confed applies flap damping • RR client may see only the replacement route Fabien Berger, berger@ip-plus.net
Shall we care? • MED usage • 34% of the prefix we receive have the MED set • 75% of our peers have > 1 prefix with MED set • Potential AS that can oscillate (AS received via > 2 peers) • 60% (upper bound, as-path not taken into account!) • Oscillation not propagated to customers because of damping • Oscillation seen in our backbone :( but cured :) Fabien Berger, berger@ip-plus.net
Solutions • configure bgp deterministic-med • full iBGP mesh when you can • do not listen to the MED (or only with stub-AS) • set metric 0 on all prefixes • bgp always-compare-med • use local-pref to force decision • exit no longer chosen by peer = more work :( • allow peer to set local-pref using community • protocol improvement • RR/confederation should send more than just the best path • closer to the iBGP full mesh :( Fabien Berger, berger@ip-plus.net
Conclusion • It’s happening today :( • It is possible to detect • Solutions (fixes) exist today • Protocol improvement on the way by IETF Fabien Berger, berger@ip-plus.net
References • [Cisco] http://www.cisco.com/warp/customer/770/fn12942.html • [IETF] draft-ietf-idr-route-oscillation-00.txt, www.ietf.org • [Nanog] NANOG 21 Atlanta February 2001, www.nanog.org • [Griffin] http://www.research.att.com/~griffin/ • [detMED] http://www.cisco.com/warp/public/459/37.html • [bgpDecision] http://www.cisco.com/warp/public/459/25.shtml Fabien Berger, berger@ip-plus.net
BGP Oscillation comments? questions? experiences? Fabien Berger, berger@ip-plus.net
BGP Decision Process [bgpDecision] 1. Largest weight 2. Largest local preference 3. Locally originated 4. Shortest AS-Path length 5. Lowest origin 6. Lowest Multi Exit Discriminator (cisco default = 0 unlesss “bgp-bestpath-missing-as-worst”) 7. Prefer EBGP over IBGP (conf EBGP=IBGP) 8. Lowest IGP metric 9. Lowest BGP router ID Fabien Berger, berger@ip-plus.net