1 / 17

Traffic Matrix Estimation: Existing Techniques and New Directions

Traffic Matrix Estimation: Existing Techniques and New Directions. A. Medina (Sprint Labs, Boston University) , N. Taft (Sprint Labs), K. Salamatian (University of Paris VI), S. Bhattacharyya, C. Diot (Sprint Labs) Presented by Matthew Caesar. Problem scope. Environment:

alban
Download Presentation

Traffic Matrix Estimation: Existing Techniques and New Directions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Traffic Matrix Estimation: Existing Techniques and New Directions A. Medina (Sprint Labs, Boston University) , N. Taft (Sprint Labs), K. Salamatian (University of Paris VI), S. Bhattacharyya, C. Diot (Sprint Labs) Presented by Matthew Caesar

  2. Problem scope • Environment: • Single ISP, provides SLAs to customers • Goal: Estimate traffic matrix • Amount of traffic flowing between each (origin, destination) pair • Hard to measure exactly (requires extensive logging and/or offline parsing) • Why would we want to know the traffic matrix? • Helps determine load balancing, routing protocols configuration, dimensioning, provisioning, failover strategies • Allows quantification of cost of providing QoS vs. overprovisioning

  3. Solution idea • Main idea: • Measure utilization (“link count”) on each network link • Can be easily done in router fast path • Done via snmp query • Find a set of OD flows that would produce the measured link counts • Sticky issue: how to find the set of OD flows? • Three techniques: • Linear Programming (LP) • Bayesian estimation • Expectation Maximization (EM)

  4. Traffic Estimation • Assumptions can be operator’s knowledge (eg. maybe some pairs are always zero) • Prior TM: sometimes need seed TM to start with • Routing Matrix • Link counts (link utilizations)

  5. Problem setup • See whiteboard

  6. Scheme #1: Linear Programming (LP) • Linear program: • Objective function + constraints • Main idea: • Try to maximize the total amount of traffic routed through the network • Given contraints: • Total traffic must be less than the measured link count • Flow conservation • Observations: • Leads to solutions where OD pairs with few intermediate hops will be assigned large amts of bandwidth, while more distant pairs will get much less bandwidth • Solution: put more weight on pairs separated by greater distances

  7. Scheme #2: Bayesian Inference • See whiteboard

  8. Scheme #3: Expectation Maximization (EM) • See whiteboard

  9. Evaluation Method • Impossible to obtain “real” traffic matrix via direct measurement. • Therefore, use simulations • How to characterize flow between OD pairs? • Tried Constant, Poisson, Gaussian, Uniform and Bimodal (flash crowd) TMs

  10. Results: Linear programming vs. Statistical methods • Linear programming method performs poorly • Assigns zero to many OD pairs, increasing error • Problem: tries to match OD pairs to link counts • Different objective functions give similar results •  error too high for use in practical networks • Bayesian and EM: • EM beats Bayesian in terms of average error and worst case error • Estimation errors correlated to heavily shared links (links with many OD flows are more likely to be mis-estimated)

  11. Results: Goodness of prior • Goodness of prior matrix (seed values) • Bayesian is much more sensitive to the prior matrix than EM • However, EM is also quite sensitive • Perhaps because: EM method has deterministic convergence behavior (can be analyzed) while Bayesian has stochastic convergence (it oscillates) • After a certain point, additional measurements don’t provide additional gain • Measuring over long periods of time only gives small additional improvement

  12. Results: Marginal gains • What improvement could be gained if we could measure some components of the traffic matrix directly? • Carrier may have the option to deploy a certain amount of monitoring equipment • 3 ways to add rows: • Randomly, row-sum (by traffic volume), and error magnitude • Results: • Error rate drops off roughly linearly with each additional row added • Bayesian not sensitive to order rows are added • EM does better when rows added by largest-error first •  reduction in adding a row is 2% for 13 OD pairs

  13. Other results • Which OD pairs are most difficult to estimate? • Error increases as the link-sharing factor increases, also as path length increases • How to characterize OD flows? • Poisson and Gaussian assumption holds well, but only for certain hours during the day.

  14. Recommendations • Network operators know a lot about their network. We need to devise methods to allow incorporation of network specific information into the estimation scheme. • We need a better model of OD flows through an ISP. • Possible solution: “gravity models” based on utility factor (see whiteboard) • We need a good way to generate good prior TMs.

  15. References: Statistical INference: • http://ic.arc.nasa.gov/ic/projects/bayes-group/html/bayes-theorem-long.html • http://www.math.uah.edu/stat/prob/prob5.html • http://www.statisticalengineering.com/bayes_thinking.htm • http://www.stat.psu.edu/~jls/stat544/2001/lec22.pdf • http://www-eksl.cs.umass.edu/library/Statistics/Expectation-Maximization/ • http://www.owlnet.rice.edu/~msmiley/elec431/em.htm Traffic Matrix Estimation: • http://dimacs.rutgers.edu/Workshops/MiningTutorial/grossglauser-slides.ppt

More Related