1 / 29

Surprising results on task assignment for high-variability workloads

Surprising results on task assignment for high-variability workloads. Mor Harchol-Balter, CMU, Computer Sci. Alan Scheller-Wolf, CMU, Tepper Business Andrew Young, Morgan Stanley. Goal: Minimize mean response time: E[T]. FIFO. Server farm model. Assignment Policy. incoming jobs. FIFO.

bowersgary
Download Presentation

Surprising results on task assignment for high-variability workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Surprising results on task assignment for high-variability workloads Mor Harchol-Balter, CMU, Computer Sci. Alan Scheller-Wolf, CMU, Tepper Business Andrew Young, Morgan Stanley

  2. Goal: Minimize mean response time: E[T] FIFO Server farm model Assignment Policy incoming jobs FIFO Poisson(l) FIFO POLICY MATTERS! Q: What is a good Assignment Policy? (high-variability jobs) Mor Harchol-Balter, CMU

  3. SITA (Size Interval) Split jobs based on size cutoffs. LWL (Least Work Left) Send job to host with least remaining work. +Protects against high variability +High throughput +Isolation for smalls Good Answers SITA LWL M/G/2 Mor Harchol-Balter, CMU

  4. SITA in Practice • Supercomputing Centers • [Hotovy, Schneider, O’Donnell 96] • [Schroeder, Harchol-Balter 00] • Manufacturing Centers • [Buzacott, Shanthikumar 93] • File Server Farms • [Cardellini, Colajanni, Yu 01] • Supermarkets Prior Work on SITA SITA variants Optimizing SITA cutoffs SITA vs. LWL SITA LWL • [Broberg, Tari, Zeephongsekul 06] • [Harchol-Balter 02] • [Ciardo, Riska, Smirni 01] • [El-Taha, Maddah 06] • [Fu, Broberg, Tari 03] • [Harchol-Balter, Crovella, Murta 99] • [Oida, Shinjo 99] • [ Tari, Broberg, Zomaya, Baldoni 05] • [Thomas 08] • [Harchol-Balter 00] • [Harchol-Balter 02] • [Thomas 08] • [Tari,Broberg,Zomaya, Baldoni 05] • [Fu, Broberg, Tari 03] All conclude SITA far superior for high variability • [Harchol-Balter,Crovella,Murta 98] • [Bachmat, Sarfati 08] • [Sarfati 08] • [Harchol-Balter, Vesilo 10]

  5. Should at least beat all commonly used policies when variability is high enough. In search of a proof of SITA’s total dominance. OK, so not optimal, but definite win for high variability. Years later Months later SITA Can’t prove anything because it’s not true! Mor Harchol-Balter, CMU

  6. Alternative Title: The TRUTH about SITA,under very high job size variability Mor Harchol-Balter, CMU

  7. as C2∞ Q: In this talk we will show ... • SITA diverges & LWL diverges? • SITA converges & LWL diverges ? • SITA diverges & LWL converges? • SITA converges & LWL converges? SITA A: All of the above LWL

  8. as C2∞ Q: In this talk we will show ... Convergent LWL Divergent LWL Convergent SITA Looking for simple job size distributions to illustrate each. Divergent SITA Mor Harchol-Balter, CMU

  9. Results (2 server system) a p Bimodal Conv. LWL Diverg. LWL 1-p b Conv. SITA or depends on pa & (1-p)b Diverg. SITA Exp(ma) p H2 1-p Exp(mb) Mor Harchol-Balter, CMU

  10. Results (2 server system) a b Trimodal Conv. LWL Diverg. LWL r < 1 c=bm > 1 Conv. SITA or depends on m Exp(ma) Diverg. SITA Exp(mb) H3 r < 1 Exp(mc) Mor Harchol-Balter, CMU

  11. Results (2 server system) Conv. LWL Diverg. LWL Conv. SITA depends on a Diverg. SITA Bounded Pareto(a) 1< a < 2 Mor Harchol-Balter, CMU

  12. Diverg LWL Conv. LWL Bimodal Results Conv. SITA Diverg SITA a pa = QE[X] p depends ra & rb 1-p Lemma: As C2 ∞, but E[X], Q: const, a’s get little smaller  QE[X] b’s get much bigger  ∞ p  1 X ~ b (1-p)b = (1-Q)E[X] THM: LWL always diverges. THM: If ra < 1 & rb < 1  Convergent SITA Mor Harchol-Balter, CMU

  13. Understanding LWL Isn’t LWL always bad for high C2? It depends … But shorts stuck behind longs, so E[T] ∞ Need 2 longs for this to be a problem! So we need: Pr{ 2 longs } * E[T| 2 longs] ? LWL Suffices to just look at E[X3/2]. Mor Harchol-Balter, CMU

  14. Understanding LWL Thm: [Scheller-Wolf, Sigman 97], [Scheller-Wolf, Vesilo 06] (2 SERVERS) 1 spare server C2∞ Thm: I can make both happen! LWL Mor Harchol-Balter, CMU

  15. Diverg LWL Conv. LWL Bimodal Results Conv. SITA Diverg SITA a pa = QE[X] p Lemma: As C2 ∞, but E[X], Q: const, a QE[X], b  ∞, p  1 depends ra & rb 1-p X ~ b (1-p)b = (1-Q)E[X] THM: LWL always diverges. THM: If ra < 1 & rb < 1  Convergent SITA Mor Harchol-Balter, CMU

  16. Diverg LWL Conv. LWL Trimodal Results Conv. SITA Diverg SITA pa a pb=b-3/2 b X~ depends on m Lemma: As C2 ∞, but E[X]: const, a E[X] b ∞, c ∞ pa  1 pc=c-3/2 c=bm THM: LWL always converges for r<1 THM: If m≤3, SITA converges If m>3, SITA diverges Mor Harchol-Balter, CMU

  17. Results (2 server system) a a p b Trimodal Bimodal Conv. LWL Diverg. LWL r < 1 1-p c=bm > 1 b Conv. SITA Exp(ma) Diverg. SITA Exp(ma) p Exp(mb) H3 H2 Way more complex, because job types overlap! “Separation in the limit” r < 1 1-p Exp(mc) Exp(mb) Mor Harchol-Balter, CMU

  18. E[T] LWL E[T] SITA SITA LWL C2 C2 Conv. LWL Diverg. LWL LWL Conv. SITA SITA Diverg. SITA E[T] E[T] SITA LWL C2 C2 Mor Harchol-Balter, CMU

  19. Diverg LWL Conv. LWL Bounded Pareto (2 server system) Conv. SITA Diverg SITA X~ Bounded Pareto (k,p,a) 1 < a < 2 Lemma: As C2 ∞, but E[X], a: const, k  (a -1)/a  E[X] p  ∞ k p depends on a THM: If a>3/2 and r<1, then LWL converges. Else LWL diverges. THM: SITA always diverges. Extends to n>2 servers when r < n-1

  20. Bounded Pareto Results Why was this not noticed? Conv. LWL Diverg. LWL a = 1.4 a = 1.6 Conv. SITA SITA E[T] E[T] SITA LWL LWL Diverg. SITA C2 C2 Mor Harchol-Balter, CMU

  21. Summary a a p b Trimodal Bimodal Conv. LWL Diverg. LWL r < 1 1-p c=bn > 1 b Conv. SITA or or Exp(ma) Diverg. SITA Exp(ma) p Exp(mb) H3 H2 r < 1 Bounded Pareto(a) 1-p Exp(mc) Exp(mb) 1< a < 2 Mor Harchol-Balter, CMU

  22. “There once was a girl, who had a little curl right in the middle of her forehead. When she was good, she was very very good. But when she was bad, she was horrid.” Old Nursery Rhyme Conv. LWL Diverg. LWL Conv. SITA Diverg. SITA When SITA is good, it is very, very good But when it is bad, it is horrid. Mor Harchol-Balter, CMU

  23. Epilogue … Mor Harchol-Balter, CMU

  24. Where did SITA go wrong? SITA designed to keep shorts from getting stuck behind longs. Isn’t that good? But stringent segregation of shorts & longs can lead to underutilization of servers. Also, for some distributions, can’t subdivide to avoid infinite variability. Mor Harchol-Balter, CMU

  25. SITA Split jobs based on size cutoffs. LWL Send job to host with least remaining work. +Protects against high variability +High throughput +Isolation for smalls Must be a better way… LWL SITA CS Take next small Take next job M/G/2 Take next job Take next job

  26. CS Take next small Must be a better way… Take next job WIN/WIN ! Shorts have isolation from longs And server utilization is high WRONG! Thm: Whenever SITA diverges, CS diverges too. Mor Harchol-Balter, CMU

  27. SITA Split jobs by size. SITA CS Take next small Thm: Whenever SITA diverges, CS diverges too. Take next job e.g., Pareto(a) e.g., Bimodal y y • PROOF: There are 2 reasons why SITA diverges under given y: • Any way of slicing leads to • (2) There is a way of slicing • away variability, but it forces S’s Smalls Larges L’s CS can’t help. CS should help, but doesn’t.

  28. SITA Split jobs by size. SITA CS Take next small Thm: Whenever SITA diverges, CS diverges too. Take next job Pareto(a) Bimodal y y • PROOF: There are 2 reasons why SITA diverges under given y: • Any way of slicing leads to • (2) There is a way of slicing • away variability, but it forces • Small job sees L of age ~Le. • Small server has been in overload for ~Le time • Small sees ~Le work  experiences Le delay. • Delay of small  ∞ as C2  ∞ S’s Smalls Larges L’s CS can’t help. CS should help, but doesn’t.

  29. Maybe isolating short jobs is not the panacea for high-variability workloads after all … CS Take next small Conclusion . Take next job SITA LWL Take next job M/G/2 Take next job Mor Harchol-Balter, CMU

More Related