End-to-End Detection of Shared Bottlenecks

End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003

Problem Statement • Given 2 end-to-end flows f1 and f2, do they share a bottleneck (a congested link i.e., link with packet drops) (OR) • Given 2 routes R1 and R2 on the Internet, do they share a bottleneck link?

Why is this hard? • No information from the network • Only information available – delay and drops. • Lots of noise – delay from intermediate links and drops on other links • Bottlenecks may change over time

Why solve this problem? • Overlays – • RON - Decide if rerouting flows bypasses congestion points or not • RON – Does such rerouting affect existing flows? Which ones? • Cooperative overlays – overlay does not want to share bottleneck with a “friendly overlay” • OverQoS – Useful to cluster together overlay links based on shared bottlenecks

Why solve this problem (cont.)? • Other applications • Massive backups of data from different servers – do them in parallel? • Content distribution – is the use of multipath going to improve performance? • Kazaa – parallel downloads from peers • Multihomed ASs can evaluate the “orthogonality” in terms other than fault-tolerance

Related Work • Past work done only with Y or Inverted-Y topologies using Poisson probes, packet pairs and inter-arrival times. Senders Receivers

Goals • Provide a general solution for double-Y topology • Work with multiple bottlenecks and provide an indicator of shared congestion • Be able to use active probe flows and also passively observed (TCP) flows • Complexity issues for clustering flows

Motivation of Our Techniques • Droptail queues + TCP – queues exhibit bursty loss periods + no losses • Queues build-up until bursty losses and decrease in sizes before increasing again • Provides motivation for correlating periods of drops and delays (proportional to queue sizes) • But…

3 4 5 6 7 8 0 1 2 T Sender 1 Flow 1 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 Sender 2 Flow 2 3 4 0 1 2 Time 0 d1 d2+ Note: is bounded by RTTmax/2 Synchronization Lag = 3T Synchronization Lag

Overview of Our Techniques • We propose 2 techniques – • Probability Distribution (PD) technique • Cross-Correlation (CC) technique • PD is based on getting the peak of the discrete probability distribution of, minimum time between drop of a flow and drop of the other • CC is based on getting the maximum cross-correlation assuming various synch. lags

PD Technique • For each dropped packet of a flow, plot PD of minimum of the time differences between its sending time and the sending times of dropped packets of other flow • If shared bottleneck, we expect (ideally) a 1 at d2-d1+ ; All flows may not see drops during same burst, so use threshold < 1 for peak • We may see more than 1 drop in a burst; cluster drops into bursts and use time differences between starts of bursts

Packet Loss Delay1 Delay2 PD technique (contd.) • Robustness issues: synch. lag must be smaller than the time difference between consecutive drops of a flow

Network Cross-Correlation (CC) Technique • Key ideas • Two “back-to-back” packets from two different flows will experience similar packet drop/delay at the bottleneck • If we can generate two sequences of “back-to-back” packets from two different flows, then we can calculate their cross-correlation coefficient of losses or delays to measure their “similarity”. • If the cross-correlation coefficient is greater than some threshold, then the two flows share a bottleneck.

Questions about the CC Technique • How to generate two sequences of “back-to-back” packets? • UDP probes with a constant interval T • average interval <= T/2 • Shift the sequence to overcome the synch. lag • How long should the two sequences be to get a significant result? • When the CC coefficient becomes relatively stable • But no less than a minimum period of time • What should the threshold be? • Use 0.1 in the experiments • Why 0.1?

Packet Loss Delay1 Shift 2 packets Delay2 Overcome the Synchronization Problem • Find the max cross-correlation by shifting one of the two sequences within some range • The value of the optimal shift is an estimation of the synchronization lag.

Wide-Area Experiments • Challenges • Access to hosts distributed globally? • How to verify our experimental results? • Solutions • PlanetLab (http://www.planet-lab.org) • Set up an overlay network with double-Y topology • Application-level routers monitor losses and delays

Topology with Shared Bottleneck (I) Vancouver Bologna Seattle Wisc Atlanta Sydney

Topology without Shared Bottleneck (II) Vancouver Bologna Seattle Wisc Atlanta Sydney

Experimental Setup • Active Probing • 40 bytes per packet • Every 10ms • Log packet arrival times on every node • Also can get information of losses from these logs • Traces from 10mins to 60mins • Threshold = 0.1 for the PD and CC techniques

Overall Results Failed Cases

Why the Delay CC Technique fails? • Delay spikes at the non-shared part.

Why the PD Technique fails? • Large synchronization lag • Few number of drops at the bottleneck

Open Issues • Parameter Selection • What should the thresholds be? • Active vs. Passive Probing • Active probing: waste network resources • Passive probing: cannot control the size/rate of the probing sequences. • Multiple Bottlenecks • Our techniques are not limited to the cases of single bottlenecks. • But need more quantitative evaluations • Probability of sharing a bottleneck • How often should we generate probing sequence to detect if two flows share a bottleneck? • Can we give a probability rather than a 0-1 decision?

Conclusions • Problem • Detect if 2 end-to-end flows share a bottleneck • Challenge • Synchronization lag in double-Y topology • Techniques • The Probability Distribution Technique • The Loss/Delay Cross-Correlation Technique • Experimental Results • The Loss CC technique succeeds with all experiments • The Delay CC technique fails in some experiments due to delay spikes at the non-shared part • The PD technique fails in some experiments due to large synch. Lag and few number of losses at the bottleneck

End-to-End Detection of Shared Bottlenecks

End-to-End Detection of Shared Bottlenecks

Presentation Transcript

End to End Protocols

End-to-End Issues

End-To-End Solutions

End to End Simulations

End-to-End Protocols

End-To-End Scheduling

End-to-end Anomalous Event Detection in Production Networks

End to End Protocols

Detecting Shared Congestion of Flows Via End-to-end Measurement

End-to-end Authorization

End to End Quality of Service

End-to-End Stewardship

End-to-End Protocols

End-to-end Anomalous Event Detection in Production Networks

End-to-End Data

End-to-end eProcurement

End-to-End Protocols

End-to-End Simulation

End to End Protocols