550 likes | 700 Views
The Evolution of Traffic Matrix Techniques and Applications: Past, Present and Future. Fan Tongliang 20081201005 College of Communication Engineering. Outline. Problem Statement Summary: traffic measurement How have traffic matrix estimation techniques evolved?
E N D
The Evolution of Traffic Matrix Techniques and Applications: Past, Present and Future Fan Tongliang 20081201005 College of Communication Engineering
Outline • Problem Statement • Summary: traffic measurement • How have traffic matrix estimation techniques evolved? • Applications of traffic matrices • Reference
Internet Evolution Grows over time… 2
Internet Evolution Say, network doubles in size Key: Where to add capacity? 2
Internet Evolution Uniformly scale all capacities? Moore’s-law like scaling sufficient? If so, good scaling! 2
Internet Evolution Scale some links faster? Moore’s-law like scaling insufficient? 2
Internet Evolution Scale some links faster? Congested hot-spots If so, poor scaling!! 2
Internet Evolution • How does the worst congestion grow? • Ideal: O(n) • How much of this is due to… • Topology? • “Power-law” structure • Routing algorithm? • BGP-Policy routing • Traffic demand matrix? • Uniform vs. non-uniform • What can be done? • Redesign the network? 2
Why Measurements ? Optimizing the Internet performance of well-connected Internet end-points • Wide-area bottlenecks • Identify and characterize bottlenecks • Multihoming route control • Quantify benefits and compare against alternatives • Will these techniques work in the future? • Current best performingBGP path • Smartselection 1
Why Measurements are Difficult • To effectively measure the global Internet, wide cooperation is needed. However, ISPs are reluctant to coordinate their efforts • Statistics collection is viewed as a luxury (OC48mon = $100,000) - only large ISPs can afford statistics collection and analysis - demand is still dormant • Best Effort service, low profit margins for ISPs make operational support difficult – data collection is low priority • Traffic volume, high trunk capacity, diversity of protocols, technologies and applications make traffic monitoring and analysis a challenging endeavor • Results get obsolete very rapidly: Internet is under very active development – traffic, technology and topology change very fast • Tremendous growth of Internet – it is difficult to scale measurements • Overprovisioning is a widely practiced solution to network congestion
How Measurement ? • Measurement is: data collection, analysis and visualization • Traffic data: • Network Topology and Mapping (connectivity) • Workload (passive or non-intrusive) • Performance (active) • Routing (BGP routing tables) • Active approach • Inject traffic and wait for arrival to the destination or reply • Passive approach • No traffic injected; Measurements are done over a collection of network monitors
Measurement Tools • Can be classified into hardware and software measurement tools • Hardware: specialized equipment • Examples: HP 4972 LAN Analyzer, DataGeneral Network Sniffer, others... • Software: special software tools • Examples: tcpdump, xtr, SNMP, others...
Measurement Tools (Cont’d) • Measurement tools can also be classified as real-time or non-real-time • Real-time: collects traffic data as it happens, and may even be able to display traffic info as it happens • Non-real-time: collected traffic data may only be a subset (sample) of the total traffic, and is analyzed off-line (later)
Measurement Tools • Link Based Tools • CoralReef • Tcpdump • Router Based Tools • SNMP Based • MRTG • NetFlow Based • FlowScan • Cflowd • MADAS • Flowtools • CISCO NetFlow FlowCollector, NetFlow Data Analyzer
Detecting Performance Problems • High utilization or loss statistics for the link • High delay or low throughput for probes • Angry customers (complaining via phone?) overload!
Two large flows of traffic New egress pointfor first flow Multi-homed customer Network Operations: Excess Traffic
Install packet filter Web server back to life… Web server at its knees… Network Operations: Denial-of-Service Attack
Routing change alleviates congestion Link failure New route overloads a link Network Operations: Link Failure
What’s a traffic matrix? Xj Yi ingress egress Xj PoP (Point of Presence) Y = A X or Y=RX “Traffic Matrix” Link Measurement Vector Routing Matrix
A B 5 3 4 4 C D Example Problem How much traffic flows between origin-destinationpairs? A->C A->D B->C B->D SNMP byte counts per link
Example: One Solution A B How much traffic flows between? A->D: 4 A->C: 1 B->C: 3 B->D: 0 5 3 4 4 0 1 4 3 C D
Example: Another Solution A B How much traffic flows between? A->D: 2 A->C: 3 B->C: 1 B->D: 2 5 3 4 4 2 3 2 1 C D Link 1 type of equations: Link1 = XAD + XBD
Inference: Network Tomography From link counts to the traffic matrix Sources 5Mbps 3Mbps 4Mbps 4Mbps Destinations
1st Generation Approaches • Linear Programming (LP) approach. • O. Goldschmidt - ISMA Workshop 2000 • Bayesian estimation. • C. Tebaldi, M. West - J. of American Statistical Association, June 1998. • Expectation Maximization (EM) approach. • J. Cao, D. Davis, S. Vander Weil, B. Yu - J. of American Statistical Association, 2000.
Linear Programming • Objective: • Constraints:
Bayesian Approach • Assumes P(Xj) follows a Poisson distribution with mean λj. (independently dist.) • needs to be estimated. (a prior is needed) • Conditioning on link counts: P(X,Λ|Y) Uses Markov Chain Monte Carlo (MCMC) simulation method to get posterior distributions. • Ultimate goal: compute P(X|Y)
Expectation Maximization (EM) • Assumes Xj are ind. dist. Gaussian. • Y=AX implies: • Requires a prior for initialization. • Incorporates multiple sets of link measurements. • Uses EM algorithm to compute MLE.
2nd generation methods • MOTIVATION: The fundamental problem is that of an under-constrained, or ill-posed, system. some sort of side information or assumptions must then be added to make the estimation problem well-posed. • What options do we have for getting more data into the problem? • Approach 1: • MLE estimation methods require a “starting point” (initial condition/prior/etc) • Can we find “intelligent starting points” based on network properties? • Approach 2: • What can we do to increase the rank of the routing matrix?
Directions • Lessons learned: • Model assumptions do not reflect the true nature of traffic. (multimodal behavior) • Dependence on priors • Link count is not sufficient (Generally more data is available to network operators.) • Proposed Solutions: • Use choice models to incorporate additional information. • Generate a good prior solution: Gravity model. • Information-Theoretic • Assignment model
a12 a13 a14 POP 2 POP 3 POP 1 POP 4 Choice Models • Let Ri be total amount of traffic entering the network that is sourced at POP i • Traffic POP(i->j)= Ri aij • What is aij ? • the proportion of traffic at ingress node ‘i’ headed to egress node ‘j’ • {aij for all j } called the “fanout” • Problem: estimate the fanouts aij
Gravity model • Router-to-router gravity model: [Zhang,Roughan, et. al. Sigcomm04] • Use this to as a smart initial condition for optimization • Solve min ||X – Xg|| s.t. || AX – Y|| is minimized • Use a least squares type solution
Gravity-based OD Flow Model • What does the gravity model say about OD flows? • Assume nodes are independent • The gravity model is a spatial model among OD flows • Gravity model is calibrated using SNMP from access and peering links entering/exiting router nodes • this is not the same SNMP data as the inter-router links used in estimation
Route Change Method • Idea: change the link weights - the new shortest paths computed will lead to new routes between some OD pairs [Soule, Nucci, Cruz, et. al. Sigmetrics04] • Each routing induces a different Y=A(r)*X where A(r) is the routing matrix for weight setting case ‘r’. • Hope: by combining all the linear constraints into one big system, we increase the rank of A from the original system. It works! • Caveat: the SNMP link counts from different routing configurations need to be collected over many hours or even days -> so we are in the non-stationary regime of OD traffic flows.
An Information-Theoretic Approach • Maximum Entropy • Entropy is a measure of uncertainty • More information = less entropy • To include measurements, maximize entropy subject to the constraints imposed by the data • Impose the fewest assumptions on the results • Instantiation: Maximize “relative entropy” • Minimum Mutual Information
Assignment model • We may see our problem as follows. and =1 • The ultimate value of OD pair can be described by: and =1
3rd generation models • Carriers set a 10% average error rate as general target. • 2nd generation methods achieving average errors in the range of 15-20%, roughly. • Can we further reduce errors? • What other kinds of information/measurements can be brought into the picture?
Two-step Statistical Approach • First step: Mlogit and Linear Choice Models • Step 2: Expectation Maximization Algorithm • The division of the TM estimationprocess into two steps offers great flexibility for combiningand evaluating different strategies that could be applied to solve theinference problem.
Tomogravity • Two step modeling. • Gravity Model: Initial solution obtained using edge link load data and ISP routing policy. • Tomographic Estimation: Initial solution is refined by applying quadratic programming to minimize distance to initial solution subject to tomographic constraints (link counts).
Genetic-Assignment algorithm • the key link C= RRT • “troublesome” OD pairs Q= RTR • problem
PCA Method • Using the measured time series of all the OD flows - do PCA analysis • output of PCA: eigenflows – a new time series • cyclical ones, bursty ones, and noisy ones • Each OD flow can be represented by a weighted sum of a small number (<10) eigenflows
PCA Solution • Rather than estimate the traffic matrix, estimate the eigenflows (elements of the low-dim representation) • this is well posed. • Rebuild the traffic matrix using the appropriate weighted sum of the eigenflows.
Issues in 3rd gen methods • Model Recalibration: need to keep models up to date as traffic evolves • For models based on 24-hours of measurements: need scheme for detecting change and deciding when to launch a new measurement collection episode. • For model with 1-flow at a time, no change detection needed; the model is essentially self updating on an ongoing basis. • Overheads • a tradeoff is induced: measurement overhead versus gain in error reduction
Areas of Application • Route selection • how to choose link weights for shortest path routing • Evaluating the impact of policy changes on traffic • Anomaly detection
Application Area #1:Selecting Link Weights for Routing • Link weights selection algorithms use a traffic matrix as input. Goal: balance traffic across links well. • suppose the input TM has errors? • how does this affect our ability to choose routes? • Want a set of routes to last many days without requiring changes. But the TM is a dynamic fluctuating thing. • Can a single set of weights be good for along time, i.e., over a variety of TMs?
Application Area #1: Some Findings • [Roughan et. al. IMC03] • yes there is some sensitivity, but it’s not too bad • except: “optimal” routing (MPLS) is more sensitive than near-optimal algorithms (OSPF) • can find a routing that is robust to daily fluctuations • [Applegate/Cohen Sigcomm03] • theoretical result, using oblivious routing... • showed that can find a single routing that works well under a wide variety of cases of traffic matrices
Application Area #2:Impact of Routing Policy Change • Using a TM, can get broad view of policy changes • Questions: • what kinds of fluctuations do we see in the TM due to changes in internal routing (IGP) ? [Agarwal, et. al. Sigmetrics04] • what kinds of fluctuations do we see in the TM due to changes in inter-domain routing (BGP)? [Teixeira, et. al. PAM05] • Answer: Not often, but when they happen, they are big (affect a lot of traffic).
Application Area #3:Anomaly Detection • A set of traffic matrices over time can be used to describe “normal” traffic. • We now have lots of models for OD flows. • Can we then identify abnormalities? • Subspace Method [Lakhina, et. al. SIGCOMM04] • Builds on the PCA idea - projects traffic flows onto low-dimensional representation and extracts outliers • There is much more that can be done here ...
Application Area #3:Anomaly Detection • Advantages of using TMs for security: have network-wide perspective • If see attack on a set of links, maybe this all belongs to one OD flow, i.e., the same attack • permits easy identification of point of entry • If one attacker attacked multiple victims, anomalies show up in one row of a TM • If multiple zombies attack a single victim, anomalies show up in a column of the TM
Traffic Matrix: Operational Uses • Short-term congestion and performance problems • Problem: predicting link loads after a routing change • Map the traffic matrix onto the new set of routes • Long-term congestion and performance problems • Problem: predicting link loads after topology changes • Map traffic matrix onto the routes on new topology • Reliability despite equipment failures • Problem: allocating spare capacity for failover • Find link weights such that no failure causes overload