410 likes | 566 Views
Trace-based Network Bandwidth Analysis and Prediction. Yi QIAO 06/10/2002. OUTLINE Introduction Data Collection and Transformation Basic Statistical Analysis of Bandwidth Trace Classification Bandwidth Prediction Conclusion. Introduction
E N D
Trace-based Network Bandwidth Analysis and Prediction Yi QIAO 06/10/2002
OUTLINE • Introduction • Data Collection and Transformation • Basic Statistical Analysis of Bandwidth • Trace Classification • Bandwidth Prediction • Conclusion
Introduction • Fact: Network bandwidth is one of the most important characteristics for both WANs and LANs • We want to know: • What does bandwidth time series looks like? • Are there any correlations between bandwidth at different times? • Do bandwidth from different traces share any common properties? • Is network bandwidth predictable or not? • Are there any differences between bandwidth data from long period traces and those from short traces?
Step by step: Trace Collection and Transformation Classification of the Traces Bandwidth Prediction
2.Data Collection and Transformation • Three Data Sets: • NLANR short-period (90 seconds) WAN traces • AUCKLAND long-period (1 day) WAN traces • BC Traces, 2 WAN traces and 2 LAN traces
Converting Trace file to Bandwidth Data: Original Trace file (Time Stamp + IP Header + TCP Header) Time Stamp + Packet Length (From IP Header) assign packets to their bins according to their timestamp, and computes instantaneous bandwidth Final Bandwidth File
3. Basic Statistical Analysis After some basic statistical analysis of the bandwidth data, such as mean and maximum value of bandwidth, standard deviation of bandwidth, we get … Correlation Coefficient
Cov Max/Mean Min/Mean Bin Size -0.4411 -0.2967 0.6788 Relationship between Mean, Min and Max Bandwidth Now, what’s the effect of bin size on these properties? Correlation Coefficient
Relationship between bin sizes and COV Relationship between bin sizes and Max/Mean
4. Traces Classification How To? What does the time series plot looks like? What does the shape for the ACF plot looks like? What percentage of ACFs is significant? What best describes the distribution (histogram) of bandwidth? What does the PSD plot looks like? Is it decreasing linearly (in log-log plot) as the frequency increase? Result: 12 Classes for NLANR traces, 8 Classes for AUCKLAND traces.
NLANR short period WAN traces classification: • Class 1: Not predictable, under-utilized Bin size: 0.001S ACF: Small value, low percentage ACFs are significant Bandwidth Distribution: Heavy-tailed distribution y=x-α PSD: Flat, contains all-frequency components like white noise.
Effect of different bin sizes: 0.01S 0.1S Different bin sizes can all give us some useful information We should all these bin sizes for each trace. 1S
B. Class 2: Little predictability, under-utilized Bin size: 0.1S for ACF; 0.001S for other plots ACF: Small value, low percentage significant ACFs Bandwidth Distribution: Multiple heavy-tailed distribution y=x-α PSD: Flat, contains all-frequency components like white noise.
C. Class 2a: No predictability, well-utilized Bin size: 0.1S for ACF; 0.001S for other plots ACF: Small value, low percentage significant ACFs Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x-α PSD: Flat, contains all-frequency components like white noise.
D. Class 4: Some predictability, under-utilized Bin size: 0.1S for ACF; 0.001S for other plots ACF: Over 50% significant ACFs Bandwidth Distribution: Multiple heavy-tailed distribution in the form of y=x-α PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant
E. Class 5: Some predictability, fairly-utilized Bin size: 0.01S for ACF; 0.001S for other plots ACF: Over 50% significant ACFs, high-frequency vibration Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x-α PSD: A dominant frequency (frequency band) component
II. Auckland long period WAN traces classification: A. Class 1: Good predictability, fairly-utilized Bin size: 1 S for all plots ACF: Over 90% significant ACFs, regular and smooth plot Bandwidth Distribution: Two separate parts and two separate peaks, all heavy-tailed PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant
B. Class 1a: Good predictability, fairly-utilized Bin size: 1 S for all plots ACF: Over 85% significant ACFs, regular and smooth plot Bandwidth Distribution: Two separate parts and two separate peaks, with large parts overlapping PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant
C. Class 2: Some predictability, well-utilized Bin size: 1 S for all plots ACF: Over 70% significant ACFs, with some high frequency fluctuation Bandwidth Distribution: Left branch - half a normal distribution; Right-branch – heavy-tailed distribution y=x-α PSD: Decreasing linearly in log-log plot as frequency increases; low-frequency components are dominant
III. Tree-based Classification Why do this? Some classes could be very similar to each other while some are quite different. This can be best described by a tree structure. Tree-based classification enables us to do classification at different granularity.
IV. Summary of Traces Classification Summary for NLANR traces (12 classes)
What else can we learn? All the long traces have some predictability. Most of the short traces are not predictable. And even for those short traces which are predictable, their predictability are still not as good as long traces. Only a small fraction of short traces could make good use of the bandwidth, while all the long traces have good (or fairly good) utilization of the bandwidth. All traces that are predictable have demonstrated some degree of long-range-dependency, including both short NLANR traces and long AUCKLAND traces.
5. Bandwidth Prediction What do we want to know? What’s the real predictability for each class that we classified? Which prediction model is best suited for bandwidth prediction? What’s the effect of different bin sizes on bandwidth prediction? Prediction models used (part of RPS Toolkit): MEAN, LAST, MA, BM, AR, ARMA, ARIMA, ARFIMA
How to evaluate predictability? Three evaluation criterions: I. The ratio of mean squared error (msqerr) to the variance of testing sequence, that is: • How well does the error distribution fit the normal distribution? (=1 ideally) • What percentage of ACFs for prediction error is significant? (=0 ideally)
Effectiveness of different predictors • A. Bandwidth prediction for NLANR traces Mean squared err/variance of testing sequence Bin size: 0.01 S
Normal Distribution Fit Bin size: 0.01 S Percentage of error ACFs that are significant Bin size: 0.01 S
B. Bandwidth prediction for AUCKLAND traces Mean squared err/variance of testing sequence Bin size: 10 S
Normal Distribution Fit Bin size: 10 S Percentage of error ACFs that are significant Bin size: 10 S
C. Bandwidth prediction for BC traces Mean squared err/variance of testing sequence Bin size: 10 S for 2 WAN traces, 0.1 S for 2 LAN traces
What does bandwidth prediction really look like? An AUCKLAND Trace Bin Size: 1000S, 100S, 10S and 1S A NLANR Trace Bin Size: 1S, 0.1S, 0.01S and 0.001S
D. Observations For almost all classes of traces, AR model can yield the optimal or near optimal prediction results among all the eight predictors that have been tested. For almost all the classes and all the predictors, the error distribution are very close to normal distribution. The value of sigacffrac for AR model are almost the lowest among all predictors for any class. Our expectation of predictability for different classes have been confirmed by real results: All these long traces are predictable, and a large fraction of them have very good predictability. While for short traces, only 20% of them have some predictability. BC traces also have some predictability.
II. Influence of bin size on bandwidth prediction A. NLANR traces(AR 32) Mean squared err/variance of testing sequence at different bin sizes (0.001S, 0.01S, 0.1 S and 1S)
B. AUCKLAND traces(AR 32) Mean squared err/variance of testing sequence at different bin sizes (1S, 10S, 100S and 1000S)
C. Observations For NLANR traces, bin size of 0.1 second gives the best prediction among all the four bin sizes. For most AUCKLAND trace, bin size of 100 second or 10 second can give the best prediction performance among the four bin sizes. For any trace, there probably exists a optimal bin size that can give the best prediction performance.
D. Further Probe For Auckland traces, there are seems to be an optimal bin size for bandwidth prediction… Red: a Class 1 trace Green: a Class 1c trace There seems to be an optimal bin size around 20 second
6. Conclusion Bandwidth traces can be classified based on their time series plot, ACF plot, distribution of bandwidth, and PSD plot. Most long period WAN traces are predictable, with some degree of long-range dependency. A small part of short period WAN traces have some predictability, also with some degree of long-range dependency. The BC LAN traces are also predictable. AR model is an ideal model for prediction because of its accuracy and efficiency. For each trace, there exists an “optimal” bin size where we can get the best prediction performance.
Acknowledgement Many Thanks to Peter, Dong, and Jason!