800 likes | 814 Views
Ph.D. Dissertation Proposal focusing on rapidly generating traffic models for constantly changing distributed networks like the Internet. The proposed solution involves structural modeling and rapid model parameterization to address the limitations of existing approaches.
E N D
Rapid generation of structural model from network measurement Kun-chan Lan USC/ISI kclan@isi.edu http://www.isi.edu/~kclan Ph.D Dissertation Proposal
Simulation vs. traffic model • Simulation and analysis heavily relies on good traffic model • But there is no “typical” traffic model !! • End system: different users, different applications, different protocols • Network: different links, different scheduling, different QoS Ph.D Dissertation Proposal
location direction time Traffic is different any which way you look • Traffic changes over time • Traffic is different at different location • Traffic is different in different direction different time different location different direction
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models Model might be obsolete • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions No “typical” model • All based on measurements taken at a single point of the network Can not get network-wide view of the traffic Ph.D Dissertation Proposal
Our goal Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet Ph.D Dissertation Proposal
Potential applications of our work • Traffic planning and provisioning • On-line simulation for network control • Input to network prediction algorithm • Detection of network attack • etc.. Ph.D Dissertation Proposal
Our solution Rapidly generate realistic traffic model in a constantly changing distributed network environment like Internet • Structural modeling • Rapid model parameterization (RAMP) • Integration of distributed measurement Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of distributed measurement • Conclusion and plan Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • Problems with existing approaches • Why structural modeling • A case study on RealAudio • Rapid model parameterization (RAMP) • Future work • Conclusion and plan Ph.D Dissertation Proposal
Markov ARMIA TES FBM FGN autocorrelation marginal distribution result Time series analysis traditional traffic modeling.. • Trace-driven time-series analysis • Can reproduce similar time series as the actual traffic • But… Trace Ph.D Dissertation Proposal
..traditional traffic modeling • Provide no or little insight about the observed characteristics of measured traffic and its underlying cause • Can not capture the feedback effect in the protocol • Internet protocols present different behavior across a range of time scales. Ph.D Dissertation Proposal
client server user > 10sec request HTTP 1~10sec response TCP < 1sec request response page transmission user think time time user click user click user click end of page Multiple levels of feedback effects in Web traffic
Structural modeling • First proposed by Willinger (1998) • Emphasize on characterizing source-level pattern in which data is sent • Explicitly take into account the hierarchical structure of application and its underlying networking mechanism when modeling the traffic Ph.D Dissertation Proposal
Application of Structural modeling: a case study on RealAudio • Why RealAudio? • different behavior at different time scales • A good example that demonstrates how traffic changes because of user/application interaction • Case study • Characteristics of RealAudio • Model for RealAudio • Validation of the model Ph.D Dissertation Proposal
Background of the Trace • Collected from audio servers at Broadcast.com, a popular audio service provider • 5.5 Million packets • The servers ran RealServer V5.0 and employed a proprietary protocol called PNA • Same traces used by Mena et al. Ph.D Dissertation Proposal
Characteristics of RealAudio.. • Individual flow • Bursty at small time scales (single second) • Constant bit rates at medium time scales (tens of seconds) Ph.D Dissertation Proposal
Characteristics of RealAudio • Aggregated traffic • Bursty on-off behavior as the individual flow (off period~1.8 seconds) • Flows are synchronized !! • Multiple clients listening to the same live music Ph.D Dissertation Proposal
user flow packet Structural model of RealAudio.. • User behavior • User arrival • Number of flows per user • Flow data • Packet length • Flow duration • Flow rate • Flow synchronization • Packet data • On period • Off period • Number of packets sent in each on-off period Ph.D Dissertation Proposal
Structural model of RealAudio • Each user picks a number of flows * • For each flow, sequentially: • Pick an overall rate *, duration * • While flow is active: • Pick on and off times • Calculate # of packets to send in on-time to satisfy rate • To simulate flow synchronization effect, all flows start at multiple of 1.8 sec user RealAudio Ph.D Dissertation Proposal
Validation of RealAudio model • Individual flow • Time-sequence plot • Aggregated flows • Qualitatively • Comparisons of multi-scaling plots • Time-variance plot • Wavelet scaling plot • Comparison of first-order statistics • user duration, flow rate,……. • Quantitatively • Mean value of first-order statistics
self-similar log10(Var(X(m)) X = X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,….. log10(m) X(4) = X(4)1 , X(4)2,….. || || (X1+X2+X3 +X4)/4 (X5+X6 +X7+X8)/4 Time-variance plot • X=(Xt; t=0,1,2,3…) is a stationary time series • In our work, we use run test to verify the stationarity of the data • X(m) = a new time series at scale m • average the original series X over non-overlapping blocks of size m Xk(m) = 1/m(Xkm-m+1 + … + Xkm), k=1,2,3,... Ph.D Dissertation Proposal
RTT self-similar Wavelet scaling plot • X0,k , k=0,1,2,3…, is a stationary time series • Coarsen X0 by averaging over non-overlapping blocks of size two to obtain a new time series X1 X1,k=2-1/2(X0,2k + X 0,2k+1) (1) • The difference between X0 and X1 is defined as D1,k=2-1/2(X0,2k - X 0,2k+1) (2) • X0 = 1/21/2 (X1 + D1) • If we iterate this process X0 = 2-n/2Xn + 2-n/2Dn + … + 2-1D1 • EnergyEjdefined asEj = k(Dj,k)2 / Nj (3) (Huang et al,99)
model that take flow synchronization into account model that doesn’t take flow synchronization into account Use multi-scaling plot to debug the model trace Ph.D Dissertation Proposal
Use multi-scaling plot to validate the model • Similar bumps at multiple of 1.8 sec in V-T plot • Similar turning point at ~5 sec • Approximately flat line at larger time scale • RealAudio is NOT self-similar Trace Model
First-order statistics • Not surprising since the model is driven by the statistics taken from the trace • But still provide us some confidence
Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and future plan Ph.D Dissertation Proposal
Agenda • Motivation • Structural modeling • RApid model parameterization (RAMP) • Traffic is different anywhere you look • Design of RAMP • Validation of RAMP • Future work • Conclusion and future plan Ph.D Dissertation Proposal
ISI and Internet Traffic Archive • ISI data was collected at ISI gateway, ITA data was collected at ita.ee.lbl.gov which serves as a Web/FTP server • ITA traffic is more sparse than ISI • ITA traffic is bimodal: HTTP + FTP Ph.D Dissertation Proposal
Metrics used for comparison • Qualitatively • CDF plots • Packet • Flow • User • Wavelet scaling plot • Quantitatively • Kolmogorov-Smirnov goodness of fit test Ph.D Dissertation Proposal
Kolmogorov-Smirnov goodness of fit test • D value: the largest absolute difference between the cumulative distribution of two sets of data • Critical value=c/n½ , c is distribution-dependent, n is the number of samples • At 0.05 level significance level • Normal distribution: c=1.36 • Exponential distribution: c=1.08 • Weibull distribution: c=0.874 • Example: for 10000 samples at 0.05 significance level, critical value = 0.00874 for data comes from Weibull distribution • If D < critical value no statistically significant difference Ph.D Dissertation Proposal
Traffic in different direction(ISI inbound and outbound traffic).. • Inbound dominated by News traffic, outbound dominated by Web traffic • Inbound has smaller RTT (~40ms) Ph.D Dissertation Proposal
Traffic in different direction(ISI inbound and outbound traffic) • Inbound traffic has more short flows, mostly contributed by DNS • D > critical value =0.00874 they are statistically different K-S Test D=0.121 D=0.097 Ph.D Dissertation Proposal
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Need a tool can quickly parameterize the model from measurement with no implicit assumption about the properties of the traffic !! Ph.D Dissertation Proposal
RApid Model Parameterization CDFs • network characteristics • Link BW • RTT RAMP NS simulation model tcpdump trace • Application behavior • Web • # of page per user • Page size • Object arrival • ….. • FTP • file arrival • file size RApid Model Parameterization Ph.D Dissertation Proposal
user Page object Structural model of Web traffic.. • User behavior • User arrival • Number of page per user session • Server popularity • Page • Page size • Page arrival • Number of objects per page • Request size • Persistent connection (HTTP1.1) • Object • Object arrival • Object size • TCP window size
What’s new in our Web model • Request size • Size of request is increasing due to the popular use of web email • TCP window size • Restrict the sending rate of server • Persistent connection
Structural model of FTP traffic • User behavior • User arrival • Number of file per user session • Server popularity • File • File size • File arrival • TCP window size user file
Validation of RAMP • first-order statistics • CDF plot • Kolmogov-Smirnov test • Wavelet scaling plot • Comparison with SURGE Ph.D Dissertation Proposal
Flow size Flow inter-arrival Flow duration Comparison of first-order statistics (ISI-1 outbound trace against the model) D < critical value =0.00874 there is no statistically significant difference K-S Test D=0.0019 D=0.0013 D=0.0018
Comparison of wavelet scaling plot • Similar RTT • Similar energy level ISI-1 ITA Ph.D Dissertation Proposal
Comparison withSURGE • Run SURGE in a LAN with one server and 4 clients • Feed the SURGE trace into RAMP, and compare the simulation result against the SURGE trace Self-similar Packet inter-arrival Wavelet scaling plot
Limitation of SURGE • Try to fit the model into some well-known analytical function • For example: use Pareto to describe the distribution of page size • Not all websites fit this profile • Page size distribution of ITA traffic is NOT heavy-tailed • ITA traffic can NOT be modeled by SURGE • RAMP does not have any implicit assumption like SURGE does not heavy-tailed !! Ph.D Dissertation Proposal
Performance of RAMP • The speed of RAMP is approximately the function of number of packets in the trace • Close to real-time !! Ph.D Dissertation Proposal
Agenda • Challenge • Structural modeling • Rapid model parameterization (RAMP) • Future work • Integration of Distributed measurement • Conclusion and plan Ph.D Dissertation Proposal
Problem with the existing traffic modeling approach • Take years from collecting traces, analyzing data to finally generating and implementing models, while network is constantly changing • Focus on fitting statistics derived from a small set of traces to some well-known analytical functions • All based on measurements taken at a single point of the network Ph.D Dissertation Proposal
The need of distributed measurement • Get a complete picture of network-wide view of traffic • Infer traffic at places where taking measurement is infeasible Traffic model data Integration Ph.D Dissertation Proposal
Challenges • Where to take the measurement? • How to merge traffic • How to infer traffic where taking measurement is infeasible if not possible Ph.D Dissertation Proposal
? ? Where to take the measurement • Vertex cover • the minimum subset of nodes that connect all the other nodes • Such nodes typically locate at the core of the network • All the edge nodes • Not scalable • Partial edge nodes • Need techniques to infer traffic at the edge nodes where measurement is not taken Ph.D Dissertation Proposal
How to merge traffic • Classify traffic into source/destination groups based on IP prefix • Each collector is responsible for a set of groups and forward traffic statistics of the other groups to their responsible collectors • All the data collectors join a well-known multicast session to exchange information Ph.D Dissertation Proposal
How to infer traffic U ? • Assuming the network can be simplified into a tree topology and node U is where traffic needed to be inferred • Node U is either a root/internal node or leaf node • If node U is the root node • If node U is the leaf node • Can we infer U based on measurement at V (and other sibling nodes) ? a b TRAFFICU=TRAFFICa+TRAFFICb – CROSSTRAFFICa,b ? U V Ph.D Dissertation Proposal