70 likes | 210 Views
Explicit Marking and Prioritized Treatment of Specific OSPF Packets for Faster Convergence and Improved Network Scalability and Stability (draft-ietf-ospf-scalability-02.txt). Anurag Maunder Sanera Systems amaunder@sanera.net. Gagan Choudhury AT&T gchoudhury@att.com. Vishwas Manral
E N D
Explicit Marking and Prioritized Treatment of Specific OSPF Packets for Faster Convergence and Improved Network Scalability and Stability (draft-ietf-ospf-scalability-02.txt) Anurag Maunder Sanera Systems amaunder@sanera.net Gagan Choudhury AT&T gchoudhury@att.com Vishwas Manral NetPlane Systems VishwasM@netplane.com Vera Sapozhnikova AT&T sapozhnikova@att.com
The Basic Issue • In Large Operational Networks Running Link-State Protocols we have Often Observed Sustained CPU Congestion (Often Memory Congestion as well) Caused by LSA Storms Triggered By • Links/Nodes Failures • Synchronization of Refreshes • Software Bugs or Procedural Errors • Congestion Reinforced by Positive Feedback Loop due to • LSA Retransmissions, possible packet droppings, possible link failures due to missed Hellos and eventual recoveries More LSAs • On Rare Occasions the Congestion Spreads to Many Nodes and Cause Significant Failures • We Propose Prioritization of Hello, LSA Acknowledgment Packets to improve Network Stability and Scalability • Prioritized Treatment may be facilitated by Special Marking • “Smart” Proprietary Implementations are perhaps already doing it but we propose them as Best Current Practices so that all implementations benefit from it
Simulation Study • Three Priority Scenarios • 1. Incoming LSUs, Hellos, LSA Acks at the Same Priority • 2. Hellos have Priority over LSUs and LSA Acks • 3. Hellos and LSA Acks have Priority over LSUs • Network Scenarios: • Network 1: 100 Nodes, 1200 Links, Max Node Adjacency 50 • Network 2: 50 Nodes, 600 Links, Max Node Adjacency 48 • LSA Scenarios • 1 Router LSA per Node, 1 TE LSA per Link • 1 Router LSA per Node, 10 ASE LSAs per Every Other Node • LSA Retransmission Timer Value: 5 Seconds or 10 Seconds • LSU Processing Time : ~ 1 ms, ~0.5 ms • Hello/Router-Dead Interval: 10 Sec/40 Sec, 2 Sec/8 Sec
Six Simulation Cases • Case 1: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec. • Case 2: Network 1, ASE LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec. • Case 3: Network 1, Link LSAs, Retransmission Timer = 5 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec. • Case 4: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 0.5 ms, Hello/Router-Dead-Interval = 10/40 Sec. • Case 5: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 2/8 Sec. • Case 6: Network 2, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
Number of Non-Converged LSAs Vs. LSA Storm - Case 1, No Priority to Hello, Ack - LSA Storm Starts Between 20 and 30 Seconds
LSA Storm Threshold for Sustained CPU Congestion * Congestion Due to Retransmissions and Adjacency Loss Due to Missed Hello ** Congestion Due to Retransmissions only (Adjacency Stays Up)
Proposal • Process Critical OSPF Packets (Hello, LSA Ack) at Higher Priority Compared to Other OSPF Packets • This May be Facilitated by Special Marking (e.g., use two Diffserv Codepoints for OSPF Packets, one for Higher and other for Lower Priority Class) • During Congestion use Any Packet Received over an Interface as a Surrogate for Hello in order to Keep Link Alive (Same Impact as Prioritized Hello) • Other Potential OSPF Packets to Get High Priority • LSA Carrying Topology Change Information • Database Description Packet from Slave That is Used as Ack • These or Similar Mechanisms are Perhaps Already Being Used in Smart Proprietary Implementations • Proposal as BCP would Benefit All Implementations