300 likes | 546 Views
Network Traffic Modeling. Punit Shah (pshah@cse.ogi.edu) CSE581 Internet Technologies OGI, OHSU 2002, March 6. Papers. Generating Representative Web Workloads for Networks and Server performance Evaluation Paul Bardford, Mark Crovells. Comp Sci Department, Boston University.
E N D
Network Traffic Modeling Punit Shah (pshah@cse.ogi.edu) CSE581 Internet Technologies OGI, OHSU 2002, March 6
Papers • Generating Representative Web Workloads for Networks and Server performance Evaluation • Paul Bardford, Mark Crovells. Comp Sci Department, Boston University. • Self-Similarity in WWW traffic: Evidence and possible cause • Mark Crovells, Azer Bestavros. Comp Sci Department, Boston University. • On the Self-Similar Nature of Ethernet Traffic • Will Leland et al. IEEE members. Funded by Boston University. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Traffic modeling Understand a nature of the network traffic • Establish a traffic pattern • Characteristics, metrics varies by the network stack layer CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Why to model a traffic ? • Understand behavior of the servers, network etc. in workload conditions. • Capacity management • infrastructure planning • Performance improvement • Design of the software and services • Testing and Validation • Developing a simulators (work load generators), e.g. ns (CMU), SURGE, SpecWeb96 and many commercially available simulators. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Model Parameters • Application layer (HTTP) • server file size distribution • request size distribution (file size + protocol headers) • temporal locality (caching) etc. • Data Link layer (Ethernet) • packets per second • mean time between two consecutive packets • bandwidth utilization • effect of number of hosts etc. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Time Series Analysis Primer • Correlation • Under similar circumstances if any two events exhibits an identical(opposite) pattern, then events are called positively(negatively) correlated. • Range for degree of correlation is [-1, 1]. • Correlation models. • Long range dependence • Current event is positively correlated to the future event. • Heavy tail • Non-negligible random distribution in the tail, e.g. hyperbolic CDF plot. Simplest distribution is Pareto. p(x) ~ x-; 0< < 2 CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
lim r(k) = k(-), 0 < < 1 k autocorrelation function Self-Similarity Term introduced by Mandelbrot in 1965. Let X = (Xt: t = 0, 1, 2, ….) be a time series mean and variance 2 For each m = 1, 2, 3 … X(m) = (Xk(m): k = 1, 2 …m) is new time series, i.e. original series is divided into m non-overlapping segments, whose autocorrelation function is r(m)(k). If r(m)(k) = r(k), then X is called (asymptotically) second order self-similar with degree H = 1 - /2. Where Xk(m) = (Xkm-m+1 + … + Xkm)/m Also by kr(k) = , self-similar means long-range dependence. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
81 17 6 99 25 21 45 4 20 18 56 7 21 82 11 8 65 34 9 20 Self-similar CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Self-similar 81 17 6 99 25 21 45 4 20 18 56 7 21 82 11 8 65 34 9 20 Xi = 228 108 177 136 i=1,m ‘Self-Similarity’ == Burstiness CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Ethernet Traffic Data Collection • Data collected over four years, Aug 1989 to Feb 1992 to account for various network topologies. • Main traffic at the time (1994) rlogin,e-mail, NFS, local radio station audio. • Hosts 140 - 1200. ~27M packets. • An instance of data collection encompassed low, medium, busy hours. • Timestamp with 20s accuracy. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Packets/unit time (empirical) CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Packets/unit time (synthetic) CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Statistical tests for self-similarity • Variance-time plot • variance of log(X(m)) is plotted against log(m); straight line with slope - > -1; H = 1 - /2 • R/S plot (rescaled adjusted range stats.) • plot grows according to power law with exponent H as a n, i.e. nH • periodogram • slope of the power spectrum of the series CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Ethernet Variance Time plot • Increasing m, slowly decreasing variance. • Curve will cross threshold-line, if not self-similar. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Ethernet Traffic Analysis • Ethernet traffic is self-similar. • Unlike common belief, during busy times degree if self-similarity (burstiness) increases. • >>50% traffic TCP packets, but no apparent effect of the non-TCP packets. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Web Traffic Data collection • Traces collected from the real users accessing the web documents (Nov 94 - May 95) using HTTP v0.9 and 1.0 (No parallel connections) • 4700 sessions • 591 users • 575,775 URL requests (46,830 unique per session) • 130,140 files transferred • Each file request is logged • URL • session, user, workstation ID • timestamp • size of doc, file transfer time CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Trace Analysis Web traffic is self-similar CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Reasons for the self-similarity • Web transmission times • Distribution is highly variable. • Available files are heavy-tailed. • Multi-media files to be blamed (image, audio, video) • Quite time • Active off and inactive off CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Quite Times CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Quite Time Distribution CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Generating Web Workload SURGE • User Equivalence (UE) • Synthesized behavior should emulate the users • Multi-threaded program. HTTP v1.0. No parallel connections • Distribution models • File sizes • Request sizes • File size + Protocol Headers • zero, if already cached • Popularity • Zipf’s law: if files are ordered in decreasing popularity, then reference to a file is inversely proportional to its rank. P 1/r • Empirical data shows the popular web-docs are extremely popular and others receive a few hits CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Model Parameters (contd.) • Embedded object count • Determines a quite time, specifically ‘active off’ • Temporal Locality (Caching) • Probability that same object would be requested again • Effect on network access • Stack distance • OFF Times • Important parameter, self-similarity is lost if OFF times are ignored Matching problem: Assign the popularity to each file for given distribution of the file size and empirical request size (count?) distribution CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
SURGE Approach Use different (well known) models for each of the model parameter CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
SURGE Validation • Compared with SpecWeb96 (specbench.org) • #of HTTP requests per second (h) • #of threads (t), per thread h/t requests • Packets/sec - baseline • tests for 70,300, 500 packets/sec CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Results • Roughly similar #of TCP packets and requests in 30min run • Mean active TCP connection is 0.028 v/s 13.9 for SURGE, with very high variance of 3.92 (0.18) indicating self-similarity • Server CPU utilization, active TCP connections are quite higher then the SepcWeb96 CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Active TCP Connections SpecWeb96 SURGE PPS 70 300 500 CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
CPU Utilization SpecWeb96 SURGE PPS 70 300 500 CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Self-Similarity SpecWeb96 SURGE PPS 70 300 500 CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)
Conclusion • Self-similarity (burstiness) is integral part of the network traffic behavior. • Degree of self-similarity increases with the load. • Server and network load is radically different than the non-self-similar models. • Nature of the congestion produced by the self-similar traffic is drastically different from the non self-similar traffic. CSE581, Winter 2002 | Punit Shah (pshah@cse.ogi.edu)