330 likes | 431 Views
Predicting and Bypassing End-to-End Internet Service Degradation. Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour Tel-Aviv University AT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University. Outline:. Degradation
E N D
Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University
Outline: • Degradation • deviation from “normal” (minimum) RTT. • Predicting Degradation: • Different Predictors • Performance Evaluation: • Precision/recall methodology • Suggested Application: Gateway selection
Intelligent Routing device ? Motivating Application AS 41 AS 123 Peering link AS 56 Peering link AS 12 • Gateway selection (Intelligent Routing device) • Choosing peering links
Data and Measurements: Sources • Aciri (CA2) • AT&T (CA1) • AT&T(NJ1) • Princeton (NJ2) • Base Measurements from 4 different location (AS) simulated 4 • gateway: • California (CA): AT&T + ACIRI • New Jersey (NJ): AT&T + Princeton
Data and Measurements: Destinations • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton (NJ2) • Obtaining a representative sets of web servers + weights • (derived from proxy-log)
Data and Measurements: RTT • Aciri (CA2) • AT&T(CA1) • AT&T(NJ1) • Princeton(NJ2) • Data: Weekly RTT (SYN) ( End to End (path+server)) • Hourly measurements 35,124 servers • Once-a-minute weighted sample measurements 100 servers
Degradation: Definition • Deviation from minimum recorded RTT (propagation delay) • Discrete degradation levels 1-6.
Objective: Avoiding degradation ? • Attempt to reroute through a different gateway • Two conditions have to hold • Need to be able to predict the failure from a gateway • Need to have a substitute gateway (low correlation between gateways) • Blackout (consecutive degradation) through one gateway
Blackout durations • Longer duration, easier to predict. • Majority of blackouts are short 1-3 consecutive points • However, considerable fraction occurs in longer durations. Long duration blackout
Gateways Correlation • Gateways are correlated but often the correlation is not too strong
Gateways Correlation • Longer blackouts more likely to be shared • failure closer to the server • Majority of 2-gateways blackouts involved same-coast pairs
Building predictors • For a given degradation level l. • Prediction per IP. • Input: Previous RTT Measurements for the IP-address. • Output: probability for a failure • Predict “failure” if probability > Ф
Actual degraded & Predicted Degraded Actual degraded & Predicted Degraded Precision = Recall = Predicted degraded Actual degraded Precision \ Recall Methodology Predicted degraded Actual degraded
Precision-recall curve • Sweep the threshold Ф in [0,1] to obtain a precision-recall curve. • In other words, let P(t) the predicted failure probability at time t
What is important for prediction? • Recency principle • The more recent RTTs are more important. • Quantity Principle • The more measurements the higher the accuracy.
Recency Principle : Importance • Test case: Single measurement predictor • predict according to a measurement x-minute ago. • observe the change in the quality of the prediction. 15% different between using the last minute measurement or the 15 minutes ago measurement
Quantity Principle: Importance • Test case: Fixed-Window-Count(FWC) • the prediction is the fraction of failures in the W most recent measurements By quantity we can achieve better precision for high recall FWC 1 FWC 5 FWC 10 FWC 50
Our predictors • Exponential Decay • Polynomial Decay • Model based Predictors: • VW-cover : Variable Window Cover algorithm • HMM : Hidden Markov Model
Exponential-decay predictors • The weight of each measurement is exponentially decreasing with its age by factor λ. For consecutive measurements: • Binary variable ft represents a failure at time t. • In general,
Polynomial-decay predictors • Exact computation required to maintaining the complete history. • We approximated it.
The VW-Cover predictor • Consists of a list of pairs ( a1 , b1) ( a2 , b2 ) …( an , bn ) • Predict a failure if exist i such that there are at least bi failures among previous ai measurements
VW-Cover predictor: Building • Build the predictor greedily to cover the failures. • Use a learning set of measurements • Pick ( a1 , b1 ) to be the pair which maximizes precision • Pick ( ai , bi ) to be the pair which maximizes precision among uncovered failures
Hidden Markov Model • Finite set states S (we use 3 states) • Output probability as(0),as(1) • Transition function, determines the probability distribution of the next state. • The probability for a failure: Where ps(t) is the probability to be at state s at time t. Ps(t) is updated according to the output of time t-1.
Predictor Performance – Level 3 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM A recall 0.5 precision close to 0.9
Predictor Performance – Level 6 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM • Degradation of level-6 are harder to predict: recall 0.5 precision 0.4
Predictor Performance: Conclusion • The best predictors in level 3 and 6 are VW-cover and HMM • But they only slightly outperform ExpDecay0.95 which is considerable simpler to implement
Gateway Selection Level 6 Level 3
Gateway Selection: Conclusion • Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway. • Static gateway selection can avoid at most 25% of degradations. • Again ExpDecay0.95 only slightly under perform the best predictor (VW-cover).
Correlation between coast • Gateway selection on same-coast pair resulted only in 10% reduction. • Chose independent gateways
Controlling prediction overhead • Type of measurements: • Active measurements : • initiate probes (SYN,ping,HTTP request). • Scalability problem. • Passive measurements: • collected on regular traffic • Controlling the prediction overhead: • Using less-recent measurements • Active measurements only to small set of destinations, which cover the majority of traffic. • Cluster destinations. The measurements of one destination can be used to predict another.
Questions ?? natali@cs.tau.ac.il edith@research.att.com haimk@cs.tau.ac.il mansour@cs.tau.ac.il