240 likes | 340 Views
End-to-End Detection of Shared Bottlenecks. Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003. Problem Statement. Given 2 end-to-end flows f1 and f2 , do they share a bottleneck (a congested link i.e., link with packet drops) (OR)
E N D
End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003
Problem Statement • Given 2 end-to-end flows f1 and f2, do they share a bottleneck (a congested link i.e., link with packet drops) (OR) • Given 2 routes R1 and R2 on the Internet, do they share a bottleneck link?
Why is this hard? • No information from the network • Only information available – delay and drops. • Lots of noise – delay from intermediate links and drops on other links • Bottlenecks may change over time
Why solve this problem? • Overlays – • RON - Decide if rerouting flows bypasses congestion points or not • RON – Does such rerouting affect existing flows? Which ones? • Cooperative overlays – overlay does not want to share bottleneck with a “friendly overlay” • OverQoS – Useful to cluster together overlay links based on shared bottlenecks
Why solve this problem (cont.)? • Other applications • Massive backups of data from different servers – do them in parallel? • Content distribution – is the use of multipath going to improve performance? • Kazaa – parallel downloads from peers • Multihomed ASs can evaluate the “orthogonality” in terms other than fault-tolerance
Related Work • Past work done only with Y or Inverted-Y topologies using Poisson probes, packet pairs and inter-arrival times. Senders Receivers
Goals • Provide a general solution for double-Y topology • Work with multiple bottlenecks and provide an indicator of shared congestion • Be able to use active probe flows and also passively observed (TCP) flows • Complexity issues for clustering flows
Motivation of Our Techniques • Droptail queues + TCP – queues exhibit bursty loss periods + no losses • Queues build-up until bursty losses and decrease in sizes before increasing again • Provides motivation for correlating periods of drops and delays (proportional to queue sizes) • But…
3 4 5 6 7 8 0 1 2 T Sender 1 Flow 1 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 Sender 2 Flow 2 3 4 0 1 2 Time 0 d1 d2+ Note: is bounded by RTTmax/2 Synchronization Lag = 3T Synchronization Lag
Overview of Our Techniques • We propose 2 techniques – • Probability Distribution (PD) technique • Cross-Correlation (CC) technique • PD is based on getting the peak of the discrete probability distribution of, minimum time between drop of a flow and drop of the other • CC is based on getting the maximum cross-correlation assuming various synch. lags
PD Technique • For each dropped packet of a flow, plot PD of minimum of the time differences between its sending time and the sending times of dropped packets of other flow • If shared bottleneck, we expect (ideally) a 1 at d2-d1+ ; All flows may not see drops during same burst, so use threshold < 1 for peak • We may see more than 1 drop in a burst; cluster drops into bursts and use time differences between starts of bursts
Packet Loss Delay1 Delay2 PD technique (contd.) • Robustness issues: synch. lag must be smaller than the time difference between consecutive drops of a flow
Network Cross-Correlation (CC) Technique • Key ideas • Two “back-to-back” packets from two different flows will experience similar packet drop/delay at the bottleneck • If we can generate two sequences of “back-to-back” packets from two different flows, then we can calculate their cross-correlation coefficient of losses or delays to measure their “similarity”. • If the cross-correlation coefficient is greater than some threshold, then the two flows share a bottleneck.
Questions about the CC Technique • How to generate two sequences of “back-to-back” packets? • UDP probes with a constant interval T • average interval <= T/2 • Shift the sequence to overcome the synch. lag • How long should the two sequences be to get a significant result? • When the CC coefficient becomes relatively stable • But no less than a minimum period of time • What should the threshold be? • Use 0.1 in the experiments • Why 0.1?
Packet Loss Delay1 Shift 2 packets Delay2 Overcome the Synchronization Problem • Find the max cross-correlation by shifting one of the two sequences within some range • The value of the optimal shift is an estimation of the synchronization lag.
Wide-Area Experiments • Challenges • Access to hosts distributed globally? • How to verify our experimental results? • Solutions • PlanetLab (http://www.planet-lab.org) • Set up an overlay network with double-Y topology • Application-level routers monitor losses and delays
Topology with Shared Bottleneck (I) Vancouver Bologna Seattle Wisc Atlanta Sydney
Topology without Shared Bottleneck (II) Vancouver Bologna Seattle Wisc Atlanta Sydney
Experimental Setup • Active Probing • 40 bytes per packet • Every 10ms • Log packet arrival times on every node • Also can get information of losses from these logs • Traces from 10mins to 60mins • Threshold = 0.1 for the PD and CC techniques
Overall Results Failed Cases
Why the Delay CC Technique fails? • Delay spikes at the non-shared part.
Why the PD Technique fails? • Large synchronization lag • Few number of drops at the bottleneck
Open Issues • Parameter Selection • What should the thresholds be? • Active vs. Passive Probing • Active probing: waste network resources • Passive probing: cannot control the size/rate of the probing sequences. • Multiple Bottlenecks • Our techniques are not limited to the cases of single bottlenecks. • But need more quantitative evaluations • Probability of sharing a bottleneck • How often should we generate probing sequence to detect if two flows share a bottleneck? • Can we give a probability rather than a 0-1 decision?
Conclusions • Problem • Detect if 2 end-to-end flows share a bottleneck • Challenge • Synchronization lag in double-Y topology • Techniques • The Probability Distribution Technique • The Loss/Delay Cross-Correlation Technique • Experimental Results • The Loss CC technique succeeds with all experiments • The Delay CC technique fails in some experiments due to delay spikes at the non-shared part • The PD technique fails in some experiments due to large synch. Lag and few number of losses at the bottleneck