580 likes | 740 Views
Traffic Engineering for ISP Networks. Jennifer Rexford Computer Science Department Princeton University http://www.cs.princeton.edu/~jrex. Outline. Internet routing Overview of the Internet routing architecture Shortest-path link-state routing between edge routers
E N D
Traffic Engineering for ISP Networks Jennifer Rexford Computer Science Department Princeton University http://www.cs.princeton.edu/~jrex
Outline • Internet routing • Overview of the Internet routing architecture • Shortest-path link-state routing between edge routers • Optimization: Tune routing to the traffic • Optimizing routing given a topology and traffic matrix • Local search to select the integer link weights • Design for optimizability: Design routing protocol • Optimal traffic engineering with link-state routing • Tomography: Infer the traffic matrix • Estimating traffic matrix from routing and link load
Autonomous Systems (ASes) • Internet is divided into Autonomous Systems • Distinct regions of administrative control • Routers/links managed by a single “institution” • Service provider, company, university, … • Hierarchy of Autonomous Systems • Large, tier-1 provider with a nationwide backbone • Medium-sized regional provider with smaller backbone • Small network run by a single company or university • Cooperate to ensure end-to-end reachability
Interdomain Routing • AS-level topology • Destinations are IP prefixes (e.g., 12.0.0.0/8) • Nodes are Autonomous Systems (ASes) • Edges are links and business relationships 4 3 5 2 6 7 1 Web server Client
Points-of-Presence (PoPs) • Inter-PoP links • Long distances • High bandwidth • Intra-PoP links • Short cables between racks or floors • Aggregated bandwidth • Links to other networks • Wide range of media and bandwidth Inter-PoP Intra-PoP Other networks
Intradomain Routing: Shortest-Path Routing • Path-selection model • Destination-based • Load-insensitive (e.g., static link weights) • Minimum hop count or sum of link weights 2 1 3 1 4 2 1 5 4 3
Computing Shortest Paths: Link-State Routing • Topology discovery • Routers flood information to learn the topology • Each router constructs a link-state database 2 1 3 1 3 2 1 5 4 3
Computing Shortest Paths: Link-State Routing • Shortest-path computation • Each router runs Dijkstra’s shortest-path algorithm • Computes the “next hop” to reach other routers 2 1 3 1 3 2 1 5 4 3
Computing Shortest Paths: Link-State Routing • Packet forwarding • Each router maintains a forwarding table • To forward incoming packets to the right next-hop link 2 1 3 1 3 2 1 5 4 3
Our Focus: Traffic Engineering (TE) • Adjusting routing to the flow of traffic • How should network administrators run their networks? • Specifically, how should they set the link weights? • Designing protocols for better traffic engineering • How should future routing protocols be designed? • Specifically, how to make TE efficient and easy? • Collecting measurements of the offered traffic • How should administrators learn the traffic matrix? • Specifically, how to infer the matrix from link loads?
Optimization: Tuning Routing to the Traffic Joint work with Bernard Fortz and Mikkel Thorup http://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdf http://www.cs.princeton.edu/~jrex/papers/opthand04.pdf
Link Weights Control the Flow of Traffic • Routers compute paths • Shortest paths as sum of link weights • Operators set the link weights • To control where the traffic goes 2 1 3 1 3 2 3 1 5 4 3
Heuristics for Setting the Link Weights • Proportional to physical distance • Cross-country links have higher weights than local ones • Minimizes end-to-end propagation delay • Inversely proportional to link capacity • Smaller weights for higher-bandwidth links • Attracts more traffic to links with more capacity • Tuned based on the offered traffic • Network-wide optimization of weights based on traffic • Directly minimizes key metrics like max link utilization
Why Are the Link Weights Static? • Strawman alternative: load-sensitive routing • Link metrics based on traffic load • Flood dynamic metrics as they change • Adapt automatically to changes in offered load • Reasons why this is typically not done • Delay-based routing unsuccessful in the early days • Oscillation as routers adapt to out-of-date information • Most Internet transfers are very short-lived • Research and standards work continues… • … but operators have to do what they can today
Big Picture: Measure, Model, and Control Network-wide “what if” model Offered traffic Changes to the network Topology/ Configuration measure control Operational network
Traffic Engineering in an ISP Backbone • Topology • Connectivity and capacity of routers and links • Traffic matrix • Offered load between points in the network • Link weights • Configurable parameters for Interior Gateway Protocol • Performance objective • Balanced load, low latency, service level agreements … • Question: Given the topology and traffic matrix in an IP network, which link weights should be used?
Key Ingredients of Our Approach • Measurement • Topology: monitoring of the routing protocols • Traffic matrix: widely deployed traffic measurement • Network-wide models • Representations of topology and traffic • “What-if” models of shortest-path routing • Network optimization • Efficient algorithms to find good configurations • Operational experience to identify key constraints
Formalizing the Optimization Problem • Input: graph G(R,L) • R is the set of routers • L is the set of unidirectional links • cl is the capacity of link l • Input: traffic matrix • Mi,j is traffic load from router i to j • Output: setting of the link weights • wlis weight on unidirectional link l • Pi,j,lis fraction of traffic from i to j traversing link l
0.25 0.25 0.5 1.0 1.0 0.25 0.25 0.5 0.5 0.5 Multiple Shortest Paths With Even Splitting Values of Pi,j,l
f(x) x 1 Defining the Objective Function • Computing the link utilization • Link load:ul = Si,j Mi,j Pi,j,l • Utilization: ul/cl • Objective functions • min(maxl(ul/cl)) • min(Slf(ul/cl))
Complexity of the Optimization Problem • NP-hard optimization problem • No efficient algorithm to find the link weights • Even for the simple convex objective functions • Why can’t we just do multi-commodity flow? • E.g., solve the multi-commodity flow problem… • … and the link weights pop out as the dual • Because IP routers cannot split arbitrarily over ties • What are the implications? • Have to resort to searching through weight settings
Optimization Based on Local Search • Start with an initial setting of the link weights • E.g., same integer weight on every link • E.g., weights inversely proportional to link capacity • E.g., existing weights in the operational network • Compute the objective function • Compute the all-pairs shortest paths to get Pi,j,l • Apply the traffic matrix Mi,j to get link loads ul • Evaluate the objective function from the ul/cl • Generate a new setting of the link weights repeat
Making the Search Efficient • Avoid repeating the same weight setting • Keep track of past values of the weight setting • … or keep a small signature (e.g., a hash) of past values • Do not evaluate a weight setting if signatures match • Avoid computing the shortest paths from scratch • Explore weight settings that changes just one weight • Apply fast incremental shortest-path algorithms • Limit the number of unique values of link weights • Do not explore all 216 possible values for each weight • Stop early, before exploring the whole search space
Incorporating Operational Realities • Minimize number of changes to the network • Changing just 1 or 2 link weights is often enough • Tolerate failure of network equipment • Weights settings usually remain good after failure • … or can be fixed by changing one or two weights • Limit dependence on measurement accuracy • Good weights remain good, despite random noise • Limit frequency of changes to the weights • Joint optimization for day and night traffic matrices
Application to AT&T’s Backbone Network • Performance of the optimized weights • Search finds a good solution within a few minutes • Much better than link capacity or physical distance • Competitive with multi-commodity flow solution • How AT&T changes the link weights • Maintenance done every night from midnight to 6am • Predict effects of removing link(s) from the network • Reoptimize the link weights to avoid congestion • Configure new weights before disabling equipment
Example from My Visit to AT&T’s Operations Center • Amtrak repairing/moving part of the train track • Need to move some of the fiber optic cables • Or, heightened risk of the cables being cut • Amtrak notifies us of the time the work will be done • AT&T engineers model the effects • Determine which IP links go over the affected fiber • Pretend the network no longer has these links • Evaluate the new shortest paths and traffic flow • Identify whether link loads will be too high
Example Continued • If load will be too high • Reoptimize the weights on the remaining links • Schedule the time for the new weights to be configured • Roll back to the old weight setting after Amtrak is done • Same process applied to other cases • Assessing the network’s risk to possible failures • Planning for maintenance of existing equipment • Adapting the link weights to installation of new links • Adapting the link weights in response to traffic shifts
Conclusions on Traffic Engineering • IP networks do not adapt on their own • Routers compute shortest paths based on static weights • Service providers need to adapt the weights • Due to failures, congestion, or planned maintenance • Leads to an interesting optimization problems • Optimize link weights based on topology and traffic • Optimization problem is computationally difficult • Forces the use of efficient local-search techniques • Results of the local search are pretty good • Near-optimal solutions that minimize disruptions
Ongoing Work • Robust link-weight assignments • Link/node failures • Range of traffic matrices • More complex routing models • Hot-potato routing • BGP routing policies • Interaction between ASes • Inter-AS negotiation for joint optimization • Grappling with scalability and trust issues
Design for Optimizability: Optimal Link-State Routing Protocol Joint work with Dahai Xu and Mung Chiang http://www.cs.princeton.edu/~jrex/papers/pefti.pdf
Revisiting TE With Link-State Routing Protocols • Advantages of link weights • One parameter for each unidirectional link • Hop-by-hop forwarding (no tunneling, no per-flow state) • New routes computed automatically after failure • Changing just a few weights can alleviate congestion • Disadvantages of link weights • Computationally expensive optimization • Suboptimal distribution of traffic • (Disruptions when changing the link weights)
Example of Inefficient TE • Simple topology • Demand of 300 units: • All on top path: 300% utilization of top path • All on bottom path: 150% utilization of bottom path • Even splitting: 150% on top path, 75% on bottom c1 = 100 t s c2 = 200
Stepping Back: Design for Optimizability • Two research approaches • Bottom up: do the best with what you have • Top down: design systems that are easier to manage • Design for manage-ability • “If you are both the professor and the student, you create exam questions that are easy to answer.” – Mung Chiang • Knowing what we know now… • How should intradomain routing protocols work… • … to make TE more efficient and hopefully easier?
Optimal TE With Multicommodity Flow • Problem with shortest-path routing • Inflexible even splitting over shortest paths • Optimal distribution of traffic • Send traffic over any paths in any proportions • Using tunneling to force traffic on the paths • Realizable with MultiProtocol Label Switching (MPLS) • Disadvantage of MPLS: high overhead • Large number of paths between pairs of routers • Must adapt the splitting ratios after each failure
Can We Have Link-State Routing and Optimal TE? • Link-state routing and hop-by-hop forwarding • Single weight on each link • Local rule to compute splitting over paths • Each router forwards based only on the destination • Link-state routing != shortest-path routing • Routers could use other traffic-splitting rules • … as long as they are locally computable • … only from the link weights
Forward Packet Based on Link Weights • Available information at router u • wu,v :weights for all links • dut : shortest distance from u to t • hu,vt : distance gap (dvt + wu,v – dut) distance gap of 1 2 1 3 1 3 2 1 5 4 3 distance gap of 0
Traffic-Splitting Function • Relative flow distributed on outgoing links • G(hu,vt): proportion sent out link v toward t • Split traffic to t in proportion • Even splitting • G(hu,vt) is 1 if hu,vt = 0 (all traffic on shortest paths) • G(hu,vt) is 0 if hu,vt > 0 (no traffic on longer paths) G(hu,vt) SG(hu,jt)
2 1 3 1 3 2 1 5 4 3 Exponential Splitting • Exponentially diminishing traffic on longer paths • Proportion on path i proportional to exp(-pi) • … where pi is the cost of path i
Optimal TE • A surprising result • This kind of link-state routing can achieve optimal TE • Optimality • Can realize the multicommodity flow traffic distribution • Expressible in terms of settings of link weights • Efficient algorithm • Computationally tractable to compute optimal weights • … for a given traffic matrix and capacitated topology
Intuition Behind the Theory Realizable with link-state routing Optimal flow routing Feasible flow routing
Finding Link-State Protocols That Achieve Optimal TE • Need an additional objective function • To find solutions expressible in terms of link weights • But we already have an objective function • So, how can we add another one? • First, solve the original optimization problem • To determine the load on each link at optimality • … i.e., the “necessary capacity” of each link • Then, solve a second optimization problem • On this new topology, with our new objective
TE Optimization Problem: Compute Necessary Capacity • Convex objective • Min sum f() over all links • Constraints • Flow conservation: must carry the traffic matrix • Capacity constraint: cannot exceed link capacity • Variables • Flow along each path • Given • Traffic matrix and link capacities
New Optimization Problem • Necessary link capacity • Flow on link u,v in the multicommdity-flow solution • … becomes the capacity of the link in the new problem • In the new optimization problem • Any feasible solution is “optimal” • … relative to the original optimization problem • So, now we can pick a new objective • Key intuition: maximizing “entropy”
Entropy Maximization • Assume we could enumerate all paths from s to t • (Though in practice this wouldn’t be practical) • Entropy • xks,t : fraction of traffic from s to t put on path k • z(x) = - x * log(x): entropy function • New objective: maximize entropy • S Mi,j(Sz(xks,t))
High-Level Overview of the Details • NEM problem always has a solution • Earlier multicommodity flow solution • Solving directly is not efficient • Need to avoid enumerating all the paths • Solving with dual decomposition • Derivation leads to the exponential function • … for splitting traffic over the multiple paths • Derivation also leads to weight-setting algorithm • Computationally efficient, better than local search
Conclusions • Protocols induce optimization problems • E.g., setting link weights to do traffic engineering • Complexity of the optimization problems • A symptom that the protocol is not quite right • E.g., NP-hard problem and suboptimal traffic flow • Design for optimizability • Design the protocol to be easy to optimize • … using optimization theory as a protocol design tool
Tomography: Inferring the Traffic Matrix Work by Yin Zhang, Matthew Roughan, Nick Duffield, and Albert Greenberg http://www.cs.utexas.edu/~yzhang/papers/tomogravity-sigm03.pdf
Computing the Traffic Matrix Mi,j • Hard to measure the traffic matrix • IP networks transmit data as individual packets • Routers do not keep traffic statistics, except link utilization on (say) a five-minute time scale • Need to infer the traffic matrix Mi,j from • Current topology G(R,L) • Current routing Pi,j,l • Current link load ul • Link capacity cl