1 / 80

Rapid generation of structural model from network measurement

Rapid generation of structural model from network measurement. Kun-chan Lan USC/ISI kclan@isi.edu http://www.isi.edu/~kclan. Simulation vs. traffic model. Simulation and analysis heavily relies on good traffic model But there is no “ typical ” traffic model !!

reneehardy
Download Presentation

Rapid generation of structural model from network measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rapid generation of structural model from network measurement Kun-chan Lan USC/ISI kclan@isi.edu http://www.isi.edu/~kclan Ph.D Dissertation Proposal

  2. Simulation vs. traffic model • Simulation and analysis heavily relies on good traffic model • But there is no “typical” traffic model !! • End system: different users, different applications, different protocols • Network: different links, different scheduling, different QoS Ph.D Dissertation Proposal

  3. location direction time Traffic is different any which way you look • Traffic changes over time • Traffic is different at different location • Traffic is different in different direction different time different location different direction

  4. Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models Model might be obsolete • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions  No “typical” model • All based on measurements taken at a single point of the network  Can not get network-wide view of the traffic Ph.D Dissertation Proposal

  5. Our goal Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet Ph.D Dissertation Proposal

  6. Potential applications of our work • Traffic planning and provisioning • On-line simulation for network control • Input to network prediction algorithm • Detection of network attack • etc.. Ph.D Dissertation Proposal

  7. Our solution Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet • Structural modeling • Rapid model parameterization (RAMP) • Integration of distributed measurement Ph.D Dissertation Proposal

  8. Agenda • Motivation • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of distributed measurement • Conclusion and plan Ph.D Dissertation Proposal

  9. Agenda • Motivation • Structural modeling • Problems with existing approaches • Why structural modeling • A case study on RealAudio • Rapid model parameterization (RAMP) • Future work • Conclusion and plan Ph.D Dissertation Proposal

  10. Markov ARMIA TES FBM FGN autocorrelation marginal distribution result Time series analysis traditional traffic modeling.. • Trace-driven time-series analysis • Can reproduce similar time series as the actual traffic • But… Trace Ph.D Dissertation Proposal

  11. ..traditional traffic modeling • Provide no or little insight about the observed characteristics of measured traffic and its underlying cause • Can not capture the feedback effect in the protocol • Internet protocols present different behavior across a range of time scales. Ph.D Dissertation Proposal

  12. client server user > 10sec request HTTP 1~10sec response TCP < 1sec request response page transmission user think time time user click user click user click end of page Multiple levels of feedback effects in Web traffic

  13. Structural modeling • First proposed by Willinger (1998) • Emphasize on characterizing source-level pattern in which data is sent • Explicitly take into account the hierarchical structure of application and its underlying networking mechanism when modeling the traffic Ph.D Dissertation Proposal

  14. Application of Structural modeling: a case study on RealAudio • Why RealAudio? • different behavior at different time scales • A good example that demonstrates how traffic changes because of user/application interaction • Case study • Characteristics of RealAudio • Model for RealAudio • Validation of the model Ph.D Dissertation Proposal

  15. Background of the Trace • Collected from audio servers at Broadcast.com, a popular audio service provider • 5.5 Million packets • The servers ran RealServer V5.0 and employed a proprietary protocol called PNA • Same traces used by Mena et al. Ph.D Dissertation Proposal

  16. Characteristics of RealAudio.. • Individual flow • Bursty at small time scales (single second) • Constant bit rates at medium time scales (tens of seconds) Ph.D Dissertation Proposal

  17. Characteristics of RealAudio • Aggregated traffic • Bursty on-off behavior as the individual flow (off period~1.8 seconds) • Flows are synchronized !! • Multiple clients listening to the same live music Ph.D Dissertation Proposal

  18. user flow packet Structural model of RealAudio.. • User behavior • User arrival • Number of flows per user • Flow data • Packet length • Flow duration • Flow rate • Flow synchronization • Packet data • On period • Off period • Number of packets sent in each on-off period Ph.D Dissertation Proposal

  19. Structural model of RealAudio • Each user picks a number of flows * • For each flow, sequentially: • Pick an overall rate *, duration * • While flow is active: • Pick on and off times • Calculate # of packets to send in on-time to satisfy rate • To simulate flow synchronization effect, all flows start at multiple of 1.8 sec user RealAudio Ph.D Dissertation Proposal

  20. Validation of RealAudio model • Individual flow • Time-sequence plot • Aggregated flows • Qualitatively • Comparisons of multi-scaling plots • Time-variance plot • Wavelet scaling plot • Comparison of first-order statistics • user duration, flow rate,……. • Quantitatively • Mean value of first-order statistics

  21. self-similar log10(Var(X(m)) X = X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,….. log10(m) X(4) = X(4)1 , X(4)2,….. || || (X1+X2+X3 +X4)/4 (X5+X6 +X7+X8)/4 Time-variance plot • X=(Xt; t=0,1,2,3…) is a stationary time series • In our work, we use run test to verify the stationarity of the data • X(m) = a new time series at scale m • average the original series X over non-overlapping blocks of size m Xk(m) = 1/m(Xkm-m+1 + … + Xkm), k=1,2,3,... Ph.D Dissertation Proposal

  22. RTT self-similar Wavelet scaling plot • X0,k , k=0,1,2,3…, is a stationary time series • Coarsen X0 by averaging over non-overlapping blocks of size two to obtain a new time series X1 X1,k=2-1/2(X0,2k + X 0,2k+1) (1) • The difference between X0 and X1 is defined as D1,k=2-1/2(X0,2k - X 0,2k+1) (2) • X0 = 1/21/2 (X1 + D1) • If we iterate this process  X0 = 2-n/2Xn + 2-n/2Dn + … + 2-1D1 • EnergyEjdefined asEj = k(Dj,k)2 / Nj (3) (Huang et al,99)

  23. model that take flow synchronization into account model that doesn’t take flow synchronization into account Use multi-scaling plot to debug the model trace Ph.D Dissertation Proposal

  24. Use multi-scaling plot to validate the model • Similar bumps at multiple of 1.8 sec in V-T plot • Similar turning point at ~5 sec • Approximately flat line at larger time scale • RealAudio is NOT self-similar Trace Model

  25. First-order statistics • Not surprising since the model is driven by the statistics taken from the trace • But still provide us some confidence

  26. Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and future plan Ph.D Dissertation Proposal

  27. Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Traffic is different anywhere you look • Design of RAMP • Validation of RAMP • Future work • Conclusion and future plan Ph.D Dissertation Proposal

  28. ISI and Internet Traffic Archive • ISI data was collected at ISI gateway, ITA data was collected at ita.ee.lbl.gov which serves as a Web/FTP server • ITA traffic is more sparse than ISI • ITA traffic is bimodal: HTTP + FTP Ph.D Dissertation Proposal

  29. Metrics used for comparison • Qualitatively • CDF plots • Packet • Flow • User • Wavelet scaling plot • Quantitatively • Kolmogorov-Smirnov goodness of fit test Ph.D Dissertation Proposal

  30. Kolmogorov-Smirnov goodness of fit test • D value: the largest absolute difference between the cumulative distribution of two sets of data • Critical value=c/n½ , c is distribution-dependent, n is the number of samples • At 0.05 level significance level • Normal distribution: c=1.36 • Exponential distribution: c=1.08 • Weibull distribution: c=0.874 • Example: for 10000 samples at 0.05 significance level, critical value = 0.00874 for data comes from Weibull distribution • If D < critical value  no statistically significant difference Ph.D Dissertation Proposal

  31. Traffic in different direction(ISI inbound and outbound traffic).. • Inbound dominated by News traffic, outbound dominated by Web traffic • Inbound has smaller RTT (~40ms) Ph.D Dissertation Proposal

  32. Traffic in different direction(ISI inbound and outbound traffic) • Inbound traffic has more short flows, mostly contributed by DNS • D > critical value =0.00874  they are statistically different K-S Test D=0.121 D=0.097 Ph.D Dissertation Proposal

  33. Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Need a tool can quickly parameterize the model from measurement with no implicit assumption about the properties of the traffic !! Ph.D Dissertation Proposal

  34. RApid Model Parameterization CDFs • network characteristics • Link BW • RTT RAMP NS simulation model tcpdump trace • Application behavior • Web • # of page per user • Page size • Object arrival • ….. • FTP • file arrival • file size RApid Model Parameterization Ph.D Dissertation Proposal

  35. user Page object Structural model of Web traffic.. • User behavior • User arrival • Number of page per user session • Server popularity • Page • Page size • Page arrival • Number of objects per page • Request size • Persistent connection (HTTP1.1) • Object • Object arrival • Object size • TCP window size

  36. What’s new in our Web model • Request size • Size of request is increasing due to the popular use of web email • TCP window size • Restrict the sending rate of server • Persistent connection

  37. Structural model of FTP traffic • User behavior • User arrival • Number of file per user session • Server popularity • File • File size • File arrival • TCP window size user file

  38. Validation of RAMP • first-order statistics • CDF plot • Kolmogov-Smirnov test • Wavelet scaling plot • Comparison with SURGE Ph.D Dissertation Proposal

  39. Flow size Flow inter-arrival Flow duration Comparison of first-order statistics (ISI-1 outbound trace against the model) D < critical value =0.00874  there is no statistically significant difference K-S Test D=0.0019 D=0.0013 D=0.0018

  40. Comparison of wavelet scaling plot • Similar RTT • Similar energy level ISI-1 ITA Ph.D Dissertation Proposal

  41. Comparison withSURGE • Run SURGE in a LAN with one server and 4 clients • Feed the SURGE trace into RAMP, and compare the simulation result against the SURGE trace Self-similar Packet inter-arrival Wavelet scaling plot

  42. Limitation of SURGE • Try to fit the model into some well-known analytical function • For example: use Pareto to describe the distribution of page size • Not all websites fit this profile • Page size distribution of ITA traffic is NOT heavy-tailed • ITA traffic can NOT be modeled by SURGE • RAMP does not have any implicit assumption like SURGE does not heavy-tailed !! Ph.D Dissertation Proposal

  43. Performance of RAMP • The speed of RAMP is approximately the function of number of packets in the trace • Close to real-time !! Ph.D Dissertation Proposal

  44. Agenda • Challenge • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and plan Ph.D Dissertation Proposal

  45. Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, while network is constantly changing • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Ph.D Dissertation Proposal

  46. The need of distributed measurement • Get a complete picture of network-wide view of traffic • Infer traffic at places where taking measurement is infeasible Traffic model data Integration Ph.D Dissertation Proposal

  47. Challenges • Where to take the measurement? • How to merge traffic • How to infer traffic where taking measurement is infeasible if not possible Ph.D Dissertation Proposal

  48. ? ? Where to take the measurement • Vertex cover • the minimum subset of nodes that connect all the other nodes • Such nodes typically locate at the core of the network • All the edge nodes • Not scalable • Partial edge nodes • Need techniques to infer traffic at the edge nodes where measurement is not taken Ph.D Dissertation Proposal

  49. How to merge traffic • Classify traffic into source/destination groups based on IP prefix • Each collector is responsible for a set of groups and forward traffic statistics of the other groups to their responsible collectors • All the data collectors join a well-known multicast session to exchange information Ph.D Dissertation Proposal

  50. How to infer traffic U ? • Assuming the network can be simplified into a tree topology and node U is where traffic needed to be inferred • Node U is either a root/internal node or leaf node • If node U is the root node • If node U is the leaf node • Can we infer U based on measurement at V (and other sibling nodes) ? a b TRAFFICU=TRAFFICa+TRAFFICb – CROSSTRAFFICa,b ? U V Ph.D Dissertation Proposal

More Related