800 likes | 813 Views
Rapid generation of structural model from network measurement. Kun-chan Lan USC/ISI kclan@isi.edu http://www.isi.edu/~kclan. Simulation vs. traffic model. Simulation and analysis heavily relies on good traffic model But there is no “ typical ” traffic model !!
E N D
Rapid generation of structural model from network measurement Kun-chan Lan USC/ISI kclan@isi.edu http://www.isi.edu/~kclan Ph.D Dissertation Proposal
Simulation vs. traffic model • Simulation and analysis heavily relies on good traffic model • But there is no “typical” traffic model !! • End system: different users, different applications, different protocols • Network: different links, different scheduling, different QoS Ph.D Dissertation Proposal
location direction time Traffic is different any which way you look • Traffic changes over time • Traffic is different at different location • Traffic is different in different direction different time different location different direction
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models Model might be obsolete • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions No “typical” model • All based on measurements taken at a single point of the network Can not get network-wide view of the traffic Ph.D Dissertation Proposal
Our goal Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet Ph.D Dissertation Proposal
Potential applications of our work • Traffic planning and provisioning • On-line simulation for network control • Input to network prediction algorithm • Detection of network attack • etc.. Ph.D Dissertation Proposal
Our solution Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet • Structural modeling • Rapid model parameterization (RAMP) • Integration of distributed measurement Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of distributed measurement • Conclusion and plan Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • Problems with existing approaches • Why structural modeling • A case study on RealAudio • Rapid model parameterization (RAMP) • Future work • Conclusion and plan Ph.D Dissertation Proposal
Markov ARMIA TES FBM FGN autocorrelation marginal distribution result Time series analysis traditional traffic modeling.. • Trace-driven time-series analysis • Can reproduce similar time series as the actual traffic • But… Trace Ph.D Dissertation Proposal
..traditional traffic modeling • Provide no or little insight about the observed characteristics of measured traffic and its underlying cause • Can not capture the feedback effect in the protocol • Internet protocols present different behavior across a range of time scales. Ph.D Dissertation Proposal
client server user > 10sec request HTTP 1~10sec response TCP < 1sec request response page transmission user think time time user click user click user click end of page Multiple levels of feedback effects in Web traffic
Structural modeling • First proposed by Willinger (1998) • Emphasize on characterizing source-level pattern in which data is sent • Explicitly take into account the hierarchical structure of application and its underlying networking mechanism when modeling the traffic Ph.D Dissertation Proposal
Application of Structural modeling: a case study on RealAudio • Why RealAudio? • different behavior at different time scales • A good example that demonstrates how traffic changes because of user/application interaction • Case study • Characteristics of RealAudio • Model for RealAudio • Validation of the model Ph.D Dissertation Proposal
Background of the Trace • Collected from audio servers at Broadcast.com, a popular audio service provider • 5.5 Million packets • The servers ran RealServer V5.0 and employed a proprietary protocol called PNA • Same traces used by Mena et al. Ph.D Dissertation Proposal
Characteristics of RealAudio.. • Individual flow • Bursty at small time scales (single second) • Constant bit rates at medium time scales (tens of seconds) Ph.D Dissertation Proposal
Characteristics of RealAudio • Aggregated traffic • Bursty on-off behavior as the individual flow (off period~1.8 seconds) • Flows are synchronized !! • Multiple clients listening to the same live music Ph.D Dissertation Proposal
user flow packet Structural model of RealAudio.. • User behavior • User arrival • Number of flows per user • Flow data • Packet length • Flow duration • Flow rate • Flow synchronization • Packet data • On period • Off period • Number of packets sent in each on-off period Ph.D Dissertation Proposal
Structural model of RealAudio • Each user picks a number of flows * • For each flow, sequentially: • Pick an overall rate *, duration * • While flow is active: • Pick on and off times • Calculate # of packets to send in on-time to satisfy rate • To simulate flow synchronization effect, all flows start at multiple of 1.8 sec user RealAudio Ph.D Dissertation Proposal
Validation of RealAudio model • Individual flow • Time-sequence plot • Aggregated flows • Qualitatively • Comparisons of multi-scaling plots • Time-variance plot • Wavelet scaling plot • Comparison of first-order statistics • user duration, flow rate,……. • Quantitatively • Mean value of first-order statistics
self-similar log10(Var(X(m)) X = X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,….. log10(m) X(4) = X(4)1 , X(4)2,….. || || (X1+X2+X3 +X4)/4 (X5+X6 +X7+X8)/4 Time-variance plot • X=(Xt; t=0,1,2,3…) is a stationary time series • In our work, we use run test to verify the stationarity of the data • X(m) = a new time series at scale m • average the original series X over non-overlapping blocks of size m Xk(m) = 1/m(Xkm-m+1 + … + Xkm), k=1,2,3,... Ph.D Dissertation Proposal
RTT self-similar Wavelet scaling plot • X0,k , k=0,1,2,3…, is a stationary time series • Coarsen X0 by averaging over non-overlapping blocks of size two to obtain a new time series X1 X1,k=2-1/2(X0,2k + X 0,2k+1) (1) • The difference between X0 and X1 is defined as D1,k=2-1/2(X0,2k - X 0,2k+1) (2) • X0 = 1/21/2 (X1 + D1) • If we iterate this process X0 = 2-n/2Xn + 2-n/2Dn + … + 2-1D1 • EnergyEjdefined asEj = k(Dj,k)2 / Nj (3) (Huang et al,99)
model that take flow synchronization into account model that doesn’t take flow synchronization into account Use multi-scaling plot to debug the model trace Ph.D Dissertation Proposal
Use multi-scaling plot to validate the model • Similar bumps at multiple of 1.8 sec in V-T plot • Similar turning point at ~5 sec • Approximately flat line at larger time scale • RealAudio is NOT self-similar Trace Model
First-order statistics • Not surprising since the model is driven by the statistics taken from the trace • But still provide us some confidence
Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and future plan Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Traffic is different anywhere you look • Design of RAMP • Validation of RAMP • Future work • Conclusion and future plan Ph.D Dissertation Proposal
ISI and Internet Traffic Archive • ISI data was collected at ISI gateway, ITA data was collected at ita.ee.lbl.gov which serves as a Web/FTP server • ITA traffic is more sparse than ISI • ITA traffic is bimodal: HTTP + FTP Ph.D Dissertation Proposal
Metrics used for comparison • Qualitatively • CDF plots • Packet • Flow • User • Wavelet scaling plot • Quantitatively • Kolmogorov-Smirnov goodness of fit test Ph.D Dissertation Proposal
Kolmogorov-Smirnov goodness of fit test • D value: the largest absolute difference between the cumulative distribution of two sets of data • Critical value=c/n½ , c is distribution-dependent, n is the number of samples • At 0.05 level significance level • Normal distribution: c=1.36 • Exponential distribution: c=1.08 • Weibull distribution: c=0.874 • Example: for 10000 samples at 0.05 significance level, critical value = 0.00874 for data comes from Weibull distribution • If D < critical value no statistically significant difference Ph.D Dissertation Proposal
Traffic in different direction(ISI inbound and outbound traffic).. • Inbound dominated by News traffic, outbound dominated by Web traffic • Inbound has smaller RTT (~40ms) Ph.D Dissertation Proposal
Traffic in different direction(ISI inbound and outbound traffic) • Inbound traffic has more short flows, mostly contributed by DNS • D > critical value =0.00874 they are statistically different K-S Test D=0.121 D=0.097 Ph.D Dissertation Proposal
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Need a tool can quickly parameterize the model from measurement with no implicit assumption about the properties of the traffic !! Ph.D Dissertation Proposal
RApid Model Parameterization CDFs • network characteristics • Link BW • RTT RAMP NS simulation model tcpdump trace • Application behavior • Web • # of page per user • Page size • Object arrival • ….. • FTP • file arrival • file size RApid Model Parameterization Ph.D Dissertation Proposal
user Page object Structural model of Web traffic.. • User behavior • User arrival • Number of page per user session • Server popularity • Page • Page size • Page arrival • Number of objects per page • Request size • Persistent connection (HTTP1.1) • Object • Object arrival • Object size • TCP window size
What’s new in our Web model • Request size • Size of request is increasing due to the popular use of web email • TCP window size • Restrict the sending rate of server • Persistent connection
Structural model of FTP traffic • User behavior • User arrival • Number of file per user session • Server popularity • File • File size • File arrival • TCP window size user file
Validation of RAMP • first-order statistics • CDF plot • Kolmogov-Smirnov test • Wavelet scaling plot • Comparison with SURGE Ph.D Dissertation Proposal
Flow size Flow inter-arrival Flow duration Comparison of first-order statistics (ISI-1 outbound trace against the model) D < critical value =0.00874 there is no statistically significant difference K-S Test D=0.0019 D=0.0013 D=0.0018
Comparison of wavelet scaling plot • Similar RTT • Similar energy level ISI-1 ITA Ph.D Dissertation Proposal
Comparison withSURGE • Run SURGE in a LAN with one server and 4 clients • Feed the SURGE trace into RAMP, and compare the simulation result against the SURGE trace Self-similar Packet inter-arrival Wavelet scaling plot
Limitation of SURGE • Try to fit the model into some well-known analytical function • For example: use Pareto to describe the distribution of page size • Not all websites fit this profile • Page size distribution of ITA traffic is NOT heavy-tailed • ITA traffic can NOT be modeled by SURGE • RAMP does not have any implicit assumption like SURGE does not heavy-tailed !! Ph.D Dissertation Proposal
Performance of RAMP • The speed of RAMP is approximately the function of number of packets in the trace • Close to real-time !! Ph.D Dissertation Proposal
Agenda • Challenge • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and plan Ph.D Dissertation Proposal
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, while network is constantly changing • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Ph.D Dissertation Proposal
The need of distributed measurement • Get a complete picture of network-wide view of traffic • Infer traffic at places where taking measurement is infeasible Traffic model data Integration Ph.D Dissertation Proposal
Challenges • Where to take the measurement? • How to merge traffic • How to infer traffic where taking measurement is infeasible if not possible Ph.D Dissertation Proposal
? ? Where to take the measurement • Vertex cover • the minimum subset of nodes that connect all the other nodes • Such nodes typically locate at the core of the network • All the edge nodes • Not scalable • Partial edge nodes • Need techniques to infer traffic at the edge nodes where measurement is not taken Ph.D Dissertation Proposal
How to merge traffic • Classify traffic into source/destination groups based on IP prefix • Each collector is responsible for a set of groups and forward traffic statistics of the other groups to their responsible collectors • All the data collectors join a well-known multicast session to exchange information Ph.D Dissertation Proposal
How to infer traffic U ? • Assuming the network can be simplified into a tree topology and node U is where traffic needed to be inferred • Node U is either a root/internal node or leaf node • If node U is the root node • If node U is the leaf node • Can we infer U based on measurement at V (and other sibling nodes) ? a b TRAFFICU=TRAFFICa+TRAFFICb – CROSSTRAFFICa,b ? U V Ph.D Dissertation Proposal