390 likes | 514 Views
Inferring a Network Congestion Map with Traffic Overhead. 0. zero. Florin Dinu T. S. Eugene Ng Rice University. Effects of Congestion. Need to identify , quantify and localize congestion. The Vision: Passively Inferred Congestion Map. AS 2. AS 1. . . . X 8.
E N D
Inferring a Network Congestion Map with Traffic Overhead 0 zero Florin Dinu T. S. Eugene Ng Rice University
Effects of Congestion Need to identify, quantify and localize congestion
The Vision: Passively Inferred Congestion Map AS2 AS1 . . . X8 R0 R1 R3 R5 . . . X7 R2 R4 R6 • Without any dedicated measurement (probing) traffic • At fine time granularities (seconds) • Good accuracy How it works? Why it works? Where is this applicable?
Benefits of Passive Inference x x Passive inference – complementary to active reporting
Overview – Passively Inferring Congestion Maps AS2 . . . AS1 R0 R0 R1 R1 R3 R5 X8 . . . X7 R2 R4 R6 • Step 1 : • Use congestion markings from existing traffic • Get path-level congestion information • Routers are AQM/ECN capable and can mark existing traffic
Overview – Passively Inferring Congestion Maps R0 R0 R1 R1 P06 R3 R5 P04 ? P46 P06 – P04 P46 = func(P06,P04) = 1 – P04 R2 R4 R6 • Step 2: • Use topological information to complete congestion map Expand on Step 1: path-level congestion from AQM/ECN markings
AQM Background • AQM = Active Queue Management • Router marks/drops packets probabilistically as a function of congestion severity • Many different definitions of congestion severity REM RED, PI Marking Probability (MP) Congestion severity We use marking probability (MP) as the congestion measure
ECN Background – Marking Data Packets ECN = Explicit Congestion Notification S D AQM/ECN Data packets are marked probabilistically
Use of the Data Markings R0 R0 R1 R1 P40 R3 R5 P30 P60 R2 R4 R6 • Data markings describe congestion on routers’ ingress paths • Data packet marking is probabilistic => • Use ratio of marked data packets to obtain MP on the ingress path
ECN Background - Echoing Echoing the markings from data packets to ACKs: S D DATA ACK The ACK markings are an altered version of the data packet markings
ECN Background – Responding to Markings Responding to marked ACKs: CWR DATA S S S D D ACK Stopping the echoing after receiving a CWR packet: CWR DATA ACK The ACK markings are an altered version of the data packet markings
Groups - Effect of ECN Echoing Groups of marked and unmarked ACKs: CWR DATA D D ACK Groups of unmarked ACKs of “size zero”: CWR DATA ACK Group of size zero
Use of the ACK Markings P03 R0 R0 R1 R1 R3 R5 P04 P05 R2 R4 R6 • ACK markings describe congestion on forward paths of the flows • ACK markings describe congestion on routers’ egress paths • Ratio of marked ACKs is an inaccurate measure ACK markings are very important and more challenging to use
Obtaining MP from ACK Markings p = MP on the forward path CWR DATA D ACK = ∑ n ∙ (1-p)n ∙ p=(1-p)/p n=0 ∞ AVG_SZ_UNMARKED= func(p) To get MP need to compute average size of groups of unmarked ACKs
Average Size of Groups of Unmarked ACKs Sampling Interval (SI) end of EI start of Estimation Interval (EI) Flow1 Flow2 Flow3 Flow4 Flow5 Training period Not selected • Select flows until a limit is reached • During training period only select flows, do not compute samples • For each following SI • Sample = avg size of groups of unmarked ACKs that finish in that SI • Discard groups that start or end in different EI • At end of EI use AVG(SAMPLES)=(1-p)/p to obtain p
Optimization – the Use of Groups of Size Zero CWR DATA ACK D Group of size zero • Probability of a group to be of size zero is: (1-p)0 ∙ p = p • If pis high, most groups will be of size zero • Better statistical significance if use groups of size zero • Routers need to be on both the data and ACK path of a flow Use of groups of size zero increases accuracy
Evaluation – Parameter Settings • ns-2 simulations, 500s simulation time • AQM algorithms (RED, PI, REM) – RED by default • SI=0.5 (congestion sample computed every 0.5s) • Monitor at most 1000 flows per EI/path • Groups of size zero used in all experiments
Evaluation – Traffic & Topology • 5ms link delay, 500Mbps link bandwidth Hop 10 R0 to Ri : 250*i2 TCP flows R1 R2 R9 R10 R0 R8 UDP UDP UDP UDP Rito Ri+2: 100 TCP flows Rito Ri+2: 100 TCP flows • Metric: 50th, 90th percentile of |inferred MP – real MP | for each link
Evaluation – vs Baseline Solution Our group-based solution (GROUP): CWR DATA D D ACK Baseline solution, no alteration (REFERENCE): CWR DATA ACK GROUP vsREFERENCE
Sensitivity to the Length of the EI Value of EI (s) - logscale Accuracy decreases with hop count but is within 0.1 for most cases
Sensitivity to Drastic Changes • UDP sources vary their sending rate by 50Mbps between 250Mbps and 750Mbps • Every 10s we start 3000 TCP flows between random nodes, for a random time (0-10s) How well does our solution track these sudden and large variations?
Sensitivity to Drastic Changes 90thperc. EI = 3s 50thperc. EI = 10s Accuracy decreases with hop count but is within 0.1- 0.15 for most cases
Sensitivity to AQM Marking Function REM RED, PI Marking/Drop Probability • Why does REM perform much worse? • Abrupt variations in marking probability • Limited visibility Congestion severity A linear marking function allows better inference for our solution
Limited Visibility P12=?? P20 P10 R0 R1 R2 R1 marks 100% of packets R2 marks 30% of packets • If P20=P10=100%, P12 is unknown (any value possible) • At high MP (less than 100%) problem still exist because very few packets are left unmarked Limited visibility appears at high MP. More probable for REM.
Sensitivity to Dropped ACKs - Numerical • ACKs can be dropped by non-AQM/ECN routers • Pure ACKs can be dropped even by AQM/ECN routers Size 4 5 1 5 Average size: 3.75 Size 8 1 4 Average size: 4.33 Drop ACKs can modify the average size of groups of unmarked ACKs
Sensitivity to Dropped ACKs - Numerical At reasonable drop probabilities the additional error is low
Other Advantages of Our Solution • Incremental deployment • On specific paths • Around non AQM-ECN routers • Useful in heterogeneous environments • Different AQM types
Related Work • Re-ECN [SIGCOMM 2005] , ConEx IETF WG • Extends ECN with one step • Sources re-echo congestion information from ACK markings • A router on forward path has upstream, downstream and whole path-congestion • Useful for traffic policing or traffic management • Lower precision. Limited by header space bits. • Needs modifications to ECN and headers • Does not address challenge posed by ACK markings • Does not go beyond path-level congestion inference
Conclusion • Novel method for inferring congestion with zero network overhead • Does not require changes to hosts, headers or protocols • Incrementally deployable and useful in heterogeneous environments • Good accuracy even in very congested environments
Thank you Credits for the pictures • http://networkequipment.net/wp-content/uploads/2011/02/voip-telephone.jpg • http://www.freefoto.com/images/04/28/04_28_50---US-Dollar-Bills_web.jpg • http://www.ciscorouting.com/routing_engine.jpg • http://www.rvoice.co.uk/uploads/Image/Green%20Tick.jpg
Why not Use Ratio for ACK Markings? The ratio of marked ACKs is very inaccurate. Need a better solution.
Effects of Using Delayed ACK - Numerical Additional error introduced by the use of delayed ACK
Sensitivity to Bandwidth (EI = 3s) Accuracy increases with bandwidth
Sensitivity to Flow Size (EI = 3s) Good accuracy even with many small flows
Severity of False Positives (EI = 3s) Small false positives inherent in probabilistic approach
Granularity of Inference R0 R0 R1 R1 P40 R3 R5 P06 R2 R4 R6 Sampling Interval (SI) Estimation Interval (EI) estimate(P06) = AVG( {samples(P06)} )
Implementation • Counters per-path • Length & Number of all groups of unmarked Acks • Counters per-flow • Current group of unmarked ACKs • Prefix matching for source and destination • Transport protocol header matching for flow identification • Sequence numbers for CWR
Coverage of Congestion Maps • Six real network topologies (Internet2, TEIN2, iLight, GEANT, SUNET, NLR) • Assume all-to-all traffic pattern • Average congestion map coverage NLR, Internet2, GEANT ~60% TEIN2 ~ 91% iLight ~ 94% SUNET ~ 95%