1 / 74

Devavrat Shah Stanford University

Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms. Devavrat Shah Stanford University. Network algorithms. Algorithms implemented in networks, e.g. in switches/routers scheduling algorithms routing lookup

natara
Download Presentation

Devavrat Shah Stanford University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms Devavrat Shah Stanford University

  2. Network algorithms • Algorithms implemented in networks, e.g. in • switches/routers scheduling algorithms routing lookup packet classification • memory/buffer managers maintaining statistics active queue management bandwidth partitioning • load balancers randomized load balancing • web caches eviction schemes placement of caches in a network

  3. Network algorithms: challenges • Time constraint: Need to make complicated decisions very quickly • e.g. line speeds in the Internet core 10Gbps (soon to be 40Gbps) • packets arrive roughly every 50ns (roughly 10ns) • Limited computational resources • due to rigid space and heat dissipation constraints • Algorithms need to be very simple so as to be implementable • but simple algorithms may perform poorly, if not well-designed

  4. An illustration • Consider the following simple system Capacity =1 • Algorithms • serve a randomly chosen queue: very easy • longest queue first (LQF): complicated to implement • Performance • serve at random: does not give maximum throughput • LQF: max throughput and minimizes the maximum backlog

  5. Input queued switch • Multiple queues and servers

  6. Input queued switch Not allowed Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  7. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  8. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  9. Switch scheduling Crossbar fabric 1 2 3 1 2 3 • Crossbar constraints • each input can connect to at most one output • each output can connect to at most one input

  10. Notation and definitions 1 2 3 1 2 3

  11. Useful performance metric • Throughput • an algorithm is stable (or delivers 100% throughput)if for any admissible arrival, the average backlog is bounded; i.e. • Average delay or average backlog (queue-size)

  12. Schedule ´ Bipartite graph matching Schedule or Matching 19 3 4 21 1 18 7

  13. Scheduling algorithms 19 3 4 21 1 18 7 • iSLIP: Cisco’s GSR 12000 Series • Nick McKeown • Wave Front Arbiter • Tamir and Chi • Parallel Iterative Matching • Anderson, Owicki, Saxe, Thacker Practical Maximal Matchings  Not stable

  14. Scheduling algorithms 19 19 18 1 7 Max Wt Matching Max Size Matching 19 3 4 21 1 18 7 Practical Maximal Matchings  Stable (Tassiulas-Ephremides 92, McKeown et. al. 96, Dai-Prabhakar 00)  Not stable  Not stable (McKeown-Ananthram-Walrand 96)

  15. The Maximum Weight Matching Algorithm • MWM: performance • throughput: stable(Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00) • backlogs: very low on average(Leonardi et al 01; Shah-Kopikare 02) • MWM: implementation • has cubic worst-case complexity (approx. 27,000 iterations for a 30-port switch) • MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed in a subsequent iteration • algorithm not amenable to pipelining

  16. Switch algorithms 19 19 18 1 7 Better performance Max Wt Matching Maximal matching Max Size Matching Easier to implement Not stable Not stable Stable and low backlogs

  17. Summarizing… • Need algorithms that are simple and perform well • can process packets economically at line speeds • deliver a high throughput • and low latencies/backlogs • Need to develop methods for analyzing the performance of these algorithms • traditional methods aren’t good enough

  18. Rest of the talk • A simple, high performance scheduling algorithm • Randomized approximation of Max Wt Matching (joint work with P. Giaccone and B. Prabhakar) • A method for analyzing backlogs based on heavy traffic theory (joint work with D. Wischik) • A new approach to switch scheduling • Randomized edge coloring (joint work with G. Aggrawal, R. Motwani and A. Zhu)

  19. Algorithm IRandomized approximationto the Max Wt Matching algorithm

  20. Randomized approximation to MWM • Consider the following randomized approximation: • At every time • - sample d matchings independently and uniformly • - use the heaviest of these d matchings to schedule packets • Ideally we would like to use a small value of d. However,… Theorem. Even with d = N, this algorithm is not stable. In fact, when d=N, the throughput · 1 – e-1¼ 0.63. (Giaccone-Prabhakar-Shah 02)

  21. Tassiulas’ algorithm Previous schedule S(t-1) Random Matching R(t) Next time MAX Current schedule S(t)

  22. Tassiulas’ algorithm MAX 10 50 10 40 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 S(t)

  23. Performance of Tassiulas’ algorithm Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.

  24. Backlogs under Tassiulas’ algorithm

  25. Reducing backlogs: the Merge operation Merge 30v/s120 130 v/s30 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160

  26. Reducing backlogs: the Merge operation Merge 10 50 40 10 30 10 70 10 60 20 S(t-1) R(t) W(R(t))=150 W(S(t-1))=160 W(S(t)) = 250

  27. Performance of Merge algorithm Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.

  28. Merge v/s Max

  29. Use arrival information: Serena 23 7 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209

  30. Use arrival information: Serena 23 47 89 3 11 31 2 97 5 S(t-1) The arrival graph W(S(t-1))=209

  31. Use arrival information: Serena 23 Merge 0 23 89 3 31 97 S(t) W(S(t))=243 23 47 89 3 11 31 97 6 S(t-1) W=121 W(S(t-1))=209

  32. Performance of Serena algorithm Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.

  33. Backlogs under Serena

  34. Summary of main ideas and results • Randomization alone is not enough • Memory (using past sample) is very powerful • Exploiting problem structure: the Merge operation • Using arrival information • We obtained an algorithm • whose performance is close to that of MWM • and is very simple to implement

  35. A new method for analyzing backlogsvia Heavy Traffic Theory

  36. Some context… • We have seen the design of a simple, high throughput switch scheduling algorithm • The backlog/delay induced by an algorithm is another very important measure of its performance • For example, recall Tassiulas’ scheme

  37. Delay or backlog analysis • Delay analysis is harder than throughput analysis because • throughput measures the long-term efficiency of an algorithm • whereas, the delay (or backlog) induced by an algorithm is a measure of its instantaneous efficiency • It is also harder to design low delay schedulers

  38. Current delay analysis methods are not suitable • Queuing theory • requires knowledge of service time distribution • but this depends on the scheduling algorithm • service distribution is too complicated to determine • Large deviations: very successful for output-queued switch • N-port OQ switch can be “decomposed” into N independent 1-port switches • but an N-port input-queued switch is not decomposable • Stochastic coupling arguments • effective for establishing: Delay(Alg A) < Delay(Alg B) (without saying how much better) • very hard to make for input-queued switches

  39. Our approach a • Use the machinery of heavy traffic theory (developed by J. M. Harrison, M. Bramson, R. Williams and many others) • Focus on the following intriguing observation • consider the MWM-a algorithm: edge(i,j) has weight Qija • Keslassy-McKeown 01 have observed that the average delay increases with a

  40. Why this is worth focusing on • The MWM class of algorithms are the most analyzed • huge body of literature on throughput analysis • hardly anything known about their delay properties • in particular, no explanation of the Keslassy-McKeown observation • Note the singularity at a=0 in the K-M observation • “continuity” implies that delay should be the smallest when a=0 • however, MWM-0 is the Maximum Size Matching algorithm • MSM is known to be unstable; i.e. delays are infinite when a=0 ! (McKeown-Ananthram-Walrand 96) • Question: if MSM is not the optimum, who is? • Heavy traffic theory helps answer these questions

  41. Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem time Original system 1/r r2 Scaled system

  42. Heavy traffic theory • Consider a network with N queues • as some parameter r ! 1 • let the load on the system approach its capacity (hence the name HT) • speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qir(t), say • as r !1, we’re looking at the zoomed out system both in time and space • this type of scaling occurs in the Central Limit Theorem • Main result: As r !1 • (Q1r(t),…,QNr(t)) ! (Q11(t),…,QN1(t)) = Q1(t), which is a multidimensional Reflected Brownian Motion (RBM) (such limiting results are independent of details of distribution) w

  43. State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N

  44. State space collapse: dimension reduction • Q1(t) can roam in lower dimensional space, say of dim K < N • The HT limit procedure for LQF system • load: lr=l1r+L+lNr = 1-r-1 ! 1 • state: Qr(t)=(Q1r(t),L,QNr(t)) ! Q1(t) = (Q11(t),L, QN1(t)) • State space collapse • because of the LQF policy, Q11(t) = L = QN1(t) • or, the dimension of the state space is just 1, not N Q1r(t) l1r lNr Capacity =1 QNr(t)

  45. State space collapse: performance • More importantly, the state space induced by an IQ switch scheduling algorithm gives an indication of its performance; i.e. State space(Alg A) ½ State space(Alg B) ) Alg B is better than Alg A • So , to resolve the Keslassy-McKeown observation, we need to • determine the state space of MWM-a, and • show • State space(MWM-a1) ½ State space(MWM-a2) if a1 > a2

  46. Switch: state space collapse 17 4 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 19 17 3 4 MWM=28 6

  47. Switch: state space collapse 19 17 3 4 MWM=28 6 19 17 4 7 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7

  48. Switch: state space collapse 19 17 3 4 MWM=28 6 4 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

  49. Switch: state space collapse 19 17 3 4 MWM=28 6 ? • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

  50. Switch: state space collapse 19 17 3 4 MWM=28 6 18 • The dimension of the state space is N2, at most • what is the dimension due to state space collapse under MWM? • Given that MWM tries to equalize the weights of all matchings, • can we determine the minimum number of queue-sizes we need to know in order to work out the size of all the N2 queues • Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know. • Consider an example of 3-port switch 7 5

More Related