400 likes | 646 Views
Backward Congestion Notification Version 2.0. Davide Bergamasco ( davide@cisco.com ) Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE 802.1 Interim Meeting Garden Grove, CA (USA) September 22, 2005. Credits. Valentina Alaria (Cisco) Andrea Baldini (Cisco) Flavio Bonomi (Cisco)
E N D
Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com) Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE 802.1 Interim Meeting Garden Grove, CA (USA) September 22, 2005
Credits • Valentina Alaria (Cisco) • Andrea Baldini (Cisco) • Flavio Bonomi (Cisco) • Manoj K. Wadekar (Intel)
BCN v2.0 • Desire from Mick to see an analytical studyof BCN stability • BCN v2.0 improvements • Linear control loop allows analysis of stability • Simplified detection mechanism • Reduced signaling rate • Original BCN framework remains the same
Suggested BCN Message Format 0 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DA = SA of sampled frame +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SA = MAC Address of CP + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IEEE 802.1Q Tag or S-Tag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EtherType = BCN |Version| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + CPID + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Qoff | Qdelta | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | | First N bytes of sampled frame starting from DA | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FCS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Suggested RLT Tag Format 0 3 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DA of rate-limited frame +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SA of rate-limited frame + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IEEE 802.1Q Tag or S-Tag of rate-limited frame | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EtherType = RLT |Version| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + CPID + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Timestamp |EtherType of rate limited frame| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Payload of rate-limited frame + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FCS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
ES6 Core Switch SJ DR2 ES1 ES2 ES3 ES4 ES5 SR2 SR1 ST1 SU1 ST2 SU2 ST3 SU3 ST4 SU4 DT DU DR1 Simulation Environment (1) TCP Bulk UDP On/Off Congestion
Simulation Environment (2) • Short Range, High Speed DC Network • Link Capacity = 10 Gbps • Switch latency = 1 s • Link Length = 100 m (0.5 s propagation delay) • Control loop • Delay ~ 3 s • Parameters • W = 2 • Gi = 4 • Gd = 1/64 • Ru = 8 Mbps • Workload • ST1-ST4: 10 parallel TCP connections transferring 1 MB each continuously • SU1-SU4: 64 KB bursts of UDP traffic starting at t = 10 ms
BCNv2.0 Faster Transient Response Higher Stability @ Steady State
Simulation Environment (3) • Long Range, High Speed DC Network • Link Capacity = 10 Gbps • Switch latency = 1 s • Link Length = 20000 m (100 s propagation delay) • Control loop • Delay ~ 200 s • Parameters • W = 2 • Gi = 4 • Gd = 1/64 • Ru = 8 Mbps • Workload • ST1-ST4: 10 parallel TCP connections transferring 1 MB each continuously • SU1-SU4: 64 KB bursts of UDP traffic starting at t = 10 ms
BCNv2.0 Much higher stability @ steady state with larger loop delays
Summary • BCN v2 has a number of advantages … • Can be studied analytically • Better protection of TCP flows in mixed TCP and UDP traffic scenarios • Detection algorithm independent of Switch implementation • Better Performance • Lower signaling frequency (from 10% to 1%) • Better stability • Increased tolerance to loop delays • … and one disadvantage • Slower convergence to fairness
Notation N: Number of Flows C: Link Capacity : Round Trip Delay w: Weight of the Derivitive Pm: Sampling Probability Gi: Additive Increase Gain Gd: Multiplicative Decrease Gain
Block Diagram of BCN Congestion Control C + + ∆R R q _ + + Gd _ Time Delay + Pm Gi
Non-linear Differential Equations Link Control Source Control If Fb(t-) > 0 If Fb(t-) < 0
Linearization Around Operating Point • Using feedback control to analyze local stability • Operating point: • R = C/N; • q’ = qeq – q = 0; • Linearization • Difficulty: depending on sgn(Fb(t-d)), the system responses are different • Luckily, a piecewise-linear function • Details are in the appendix
add lead zero to compensate Block Diagram of BCN Feedback Control + R q + lose 90o margin Multiplicative Decrease: _ Fb Additive Increase: +
zero:dq/dt The Effect Of Zero From Time Domain’s Eyes R q
Choosing Parameters – an example • Network conditions (10G link) • N = 50 • = 200us • Choose parameters such that the feedback loop is stable with a 35o margin • w = 4 • Gi = 2Mbps • Gd = 1/128 • Pm = 0.01
With N = 50, delay = 200us, the system is stable • Phase margin translates into allowing extreme network conditions of N -> 1000 flows or -> 1ms before oscillation Stability Result: lost 90o margin
Simulation Result Shows A Stable System for N = 50; Delay = 200us
Simulation Result Shows System is stable, but on the verge of oscillation: N = 50, Delay = 1ms
When w = 1, a system with N = 50, delay = 200us already runs out of margin, on the verge of oscillation • w = 1, diminishing zero effect. System can’t cope with wide range of network conditions Change W = 4 -> 1
Indeed System is stable, but on the verge of oscillation even for N = 50, Delay = 200us when w = 1.0
Requests to 802.1 • Start a Task Force on Congestion Management • Use BCN as a Baseline Proposal
Issue #1: Non-linearity Q • ISSUE: Overshoots and undershoots accumulate over time • SOLUTION: Signal only when • Q > Qeq && dQ/dt > 0 • Q < Qeq && dQ/dt < 0 • Easy to implement in hardware: just an Up/Down counter • Increment @ every enqueue • Decrement @ every dequeue • Reduces signaling rate by 50%!! Stop Generation of BCN Messages + - - + + - - + Qeq t
39 39 39