740 likes | 869 Views
Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms. Devavrat Shah Stanford University. Network algorithms. Algorithms implemented in networks, e.g. in switches/routers scheduling algorithms routing lookup
E N D
Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms Devavrat Shah Stanford University
Network algorithms • Algorithms implemented in networks, e.g. in • switches/routers scheduling algorithms routing lookup packet classification • memory/buffer managers maintaining statistics active queue management bandwidth partitioning • load balancers randomized load balancing • web caches eviction schemes placement of caches in a network
Network algorithms: challenges • Time constraint: Need to make complicated decisions very quickly • e.g. line speeds in the Internet core 10Gbps (soon to be 40Gbps) • packets arrive roughly every 50ns (roughly 10ns) • Limited computational resources • due to rigid space and heat dissipation constraints • Algorithms need to be very simple so as to be implementable • but simple algorithms may perform poorly, if not well-designed
An illustration • Consider the following simple system Capacity =1 • Algorithms • serve a randomly chosen queue: very easy • longest queue first (LQF): complicated to implement • Performance • serve at random: does not give maximum throughput • LQF: max throughput and minimizes the maximum backlog
Input queued switch • Multiple queues and servers
Input queued switch Not allowed Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input
Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input
Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input
Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input
Notation and definitions 1 2 3 1 2 3
Useful performance metric • Throughput • an algorithm is stable (or delivers 100% throughput)if for any admissible arrival, the average backlog is bounded; i.e. • Average delay or average backlog (queue-size)
Schedule ´ Bipartite graph matching Schedule or Matching 19 3 4 21 1 18 7
Scheduling algorithms 19 3 4 21 1 18 7 • iSLIP: Cisco’s GSR 12000 Series • Nick McKeown • Wave Front Arbiter • Tamir and Chi • Parallel Iterative Matching • Anderson, Owicki, Saxe, Thacker Practical Maximal Matchings Not stable
Scheduling algorithms 19 19 18 1 7 Max Wt Matching Max Size Matching 19 3 4 21 1 18 7 Practical Maximal Matchings Stable (Tassiulas-Ephremides 92, McKeown et. al. 96, Dai-Prabhakar 00) Not stable Not stable (McKeown-Ananthram-Walrand 96)
The Maximum Weight Matching Algorithm • MWM: performance • throughput: stable(Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00) • backlogs: very low on average(Leonardi et al 01; Shah-Kopikare 02) • MWM: implementation • has cubic worst-case complexity (approx. 27,000 iterations for a 30-port switch) • MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed in a subsequent iteration • algorithm not amenable to pipelining
Switch algorithms 19 19 18 1 7 Better performance Max Wt Matching Maximal matching Max Size Matching Easier to implement Not stable Not stable Stable and low backlogs
Summarizing… • Need algorithms that are simple and perform well • can process packets economically at line speeds • deliver a high throughput • and low latencies/backlogs • Need to develop methods for analyzing the performance of these algorithms • traditional methods aren’t good enough
Rest of the talk • A simple, high performance scheduling algorithm • Randomized approximation of Max Wt Matching (joint work with P. Giaccone and B. Prabhakar) • A method for analyzing backlogs based on heavy traffic theory (joint work with D. Wischik) • A new approach to switch scheduling • Randomized edge coloring (joint work with G. Aggrawal, R. Motwani and A. Zhu)
Algorithm IRandomized approximationto the Max Wt Matching algorithm
Randomized approximation to MWM • Consider the following randomized approximation: • At every time • - sample d matchings independently and uniformly • - use the heaviest of these d matchings to schedule packets • Ideally we would like to use a small value of d. However,… Theorem. Even with d = N, this algorithm is not stable. In fact, when d=N, the throughput · 1 – e-1¼ 0.63. (Giaccone-Prabhakar-Shah 02)
Tassiulas’ algorithm Previous schedule S(t-1) Random Matching R(t) Next time MAX Current schedule S(t)
Tassiulas’ algorithm MAX 10 50 10 40 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 S(t)
Performance of Tassiulas’ algorithm Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.
Reducing backlogs: the Merge operation Merge 30v/s120 130 v/s30 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160
Reducing backlogs: the Merge operation Merge 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 W(S(t)) = 250
Performance of Merge algorithm Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.
Use arrival information: Serena 23 7 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209
Use arrival information: Serena 23 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209
Use arrival information: Serena 23 Merge 0 23 89 3 31 97 S(t) W(S(t))=243 23 47 89 3 11 31 97 6 S(t-1) W=121 W(S(t-1))=209
Performance of Serena algorithm Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.
Summary of main ideas and results • Randomization alone is not enough • Memory (using past sample) is very powerful • Exploiting problem structure: the Merge operation • Using arrival information • We obtained an algorithm • whose performance is close to that of MWM • and is very simple to implement
Some context… • We have seen the design of a simple, high throughput switch scheduling algorithm • The backlog/delay induced by an algorithm is another very important measure of its performance • For example, recall Tassiulas’ scheme
Delay or backlog analysis • Delay analysis is harder than throughput analysis because • throughput measures the long-term efficiency of an algorithm • whereas, the delay (or backlog) induced by an algorithm is a measure of its instantaneous efficiency • It is also harder to design low delay schedulers
Current delay analysis methods are not suitable • Queuing theory • requires knowledge of service time distribution • but this depends on the scheduling algorithm • service distribution is too complicated to determine • Large deviations: very successful for output-queued switch • N-port OQ switch can be “decomposed” into N independent 1-port switches • but an N-port input-queued switch is not decomposable • Stochastic coupling arguments • effective for establishing: Delay(Alg A) < Delay(Alg B) (without saying how much better) • very hard to make for input-queued switches
Our approach a • Use the machinery of heavy traffic theory (developed by J. M. Harrison, M. Bramson, R. Williams and many others) • Focus on the following intriguing observation • consider the MWM-a algorithm: edge(i,j) has weight Qija • Keslassy-McKeown 01 have observed that the average delay increases with a
Why this is worth focusing on • The MWM class of algorithms are the most analyzed • huge body of literature on throughput analysis • hardly anything known about their delay properties • in particular, no explanation of the Keslassy-McKeown observation • Note the singularity at a=0 in the K-M observation • “continuity” implies that delay should be the smallest when a=0 • however, MWM-0 is the Maximum Size Matching algorithm • MSM is known to be unstable; i.e. delays are infinite when a=0 ! (McKeown-Ananthram-Walrand 96) • Question: if MSM is not the optimum, who is? • Heavy traffic theory helps answer these questions
Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem time Original system 1/r r2 Scaled system
Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem • Main result: As r !1 • (Q1r(t),…,QNr(t)) ! (Q11(t),…,QN1(t)) = Q1(t), which is a multidimensional Reflected Brownian Motion (RBM) (such limiting results are independent of details of distribution) w
State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N
State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N • The HT limit procedure for LQF system • load: lr=l1r+L+lNr = 1-r-1 ! 1 • state: Qr(t)=(Q1r(t),L,QNr(t)) ! Q1(t) = (Q11(t),L, QN1(t)) • State space collapse • because of the LQF policy, Q11(t) = L = QN1(t) • or, the dimension of the state space is just 1, not N Q1r(t) l1r lNr Capacity =1 QNr(t)
State space collapse: performance • More importantly, the state space induced by an IQ switch scheduling algorithm gives an indication of its performance; i.e. State space(Alg A) ½ State space(Alg B) ) Alg B is better than Alg A • So , to resolve the Keslassy-McKeown observation, we need to • determine the state space of MWM-a, and • show • State space(MWM-a1) ½ State space(MWM-a2) if a1 > a2
Switch: state space collapse 17 4 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 19 17 3 4 MWM=28 6
Switch: state space collapse 19 17 3 4 MWM=28 6 19 17 4 7 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7
Switch: state space collapse 19 17 3 4 MWM=28 6 4 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5
Switch: state space collapse 19 17 3 4 MWM=28 6 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5
Switch: state space collapse 19 17 3 4 MWM=28 6 18 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5