1 / 42

Mor Harchol-Balter Carnegie Mellon University School of Computer Science

Scheduling Your Network Connections. Mor Harchol-Balter Carnegie Mellon University School of Computer Science. FCFS. jobs. jobs. PS. SRPT. jobs. Q: Which minimizes mean response time?. “size” = service requirement. load r < 1. Q: Which best represents

homer
Download Presentation

Mor Harchol-Balter Carnegie Mellon University School of Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling Your Network Connections Mor Harchol-Balter Carnegie Mellon University School of Computer Science

  2. FCFS jobs jobs PS SRPT jobs Q: Which minimizes mean response time? “size” = service requirement load r < 1

  3. Q: Which best represents scheduling in web servers ? FCFS jobs “size” = service requirement load r < 1 jobs PS SRPT jobs

  4. IDEA How about using SRPT instead of PS in web servers? client 1 “Get File 1” WEB SERVER (Apache) client 2 Internet “Get File 2” Linux 0.S. client 3 “Get File 3”

  5. Immediate Objections 1) Can’t assume known job size Many servers receive mostly static web requests.“GET FILE”For static web requests, know file size Approx. know service requirement of request. 2) But the big jobs will starve ...

  6. Wierman Schroeder Outline of Talk THEORY (M/G/1) [BH – Sigmetrics 01] “Analysis of SRPT: Investigating Unfairness” [HSW-Performance 02] “Asymptotic Convergence of Scheduling Policies…” [WH – Sigmetrics 03*] “Classifying Scheduling Policies wrt Unfairness …” IMPLEMENT [HSBA – TOCS 03] “Size-based Scheduling to Improve Web Performance” [SH – ITC 03*] “Web servers under overload: How scheduling can help” [MSAH – ICDE03] “Priority Mechanisms for OLTP and Web Applications” www.cs.cmu.edu/~harchol/

  7. SRPT has a long history ... THEORY 1966 Schrage & Miller derive M/G/1/SRPT response time: 1968 Schrage proves optimality 1979 Pechinkin & Solovyev & Yashkov generalize 1990 Schassberger derives distribution on queue length BUT WHAT DOES IT ALL MEAN?

  8. THEORY SRPT has a long history (cont.) 1990 - 97 7-year long study at Univ. of Aachen under Schreiber SRPT WINS BIG ON MEAN! 1998, 1999 Slowdown for SRPT under adversary: Rajmohan, Gehrke, Muthukrishnan, Rajaraman, Shaheen, Bender, Chakrabarti, etc. SRPT STARVES BIG JOBS! Various o.s. books: Silberschatz, Stallings, Tannenbaum: Warn about starvation of big jobs ... Kleinrock’s Conservation Law: “Preferential treatment given to one class of customers is afforded at the expense of other customers.”

  9. ? PS ? SRPT Unfairness Question Let r=0.9. Let G: Bounded Pareto(a = 1.1, max=1010) Question: Which queue does biggest job prefer? M/G/1 M/G/1

  10. PS SRPT I SRPT Results on Unfairness Let r=0.9. Let G: Bounded Pareto(a = 1.1, max=1010)

  11. Unfairness – General Distribution All-can-win-theorem: For all distributions, if r< ½, E[T(x)]SRPT< E[T(x)]PSfor all x.

  12. x ò 2 + x F ( x ) 2 t f ( t ) dt l l x dt ò 0 - r ( - r ( 2 2 ( 1 x )) 1 t ) 0 All-can-win-theorem: For all distributions, if r< ½, E[T(x)]SRPTE[T(x)]PSfor all x. £ Proof idea: + Waiting time (SRPT) Residence (SRPT) Total (PS)

  13. > > P P PS PS $ $ x x , , E E [ [ T T ( ( x x )] )] E E [ [ T T ( ( x x )] )] £ £ P P PS PS E E [ [ T T ( ( x x )] )] E E [ [ T T ( ( x x )] )] , , " " x x Classification of Scheduling Policies ALWAYS FAIR For all loads, for all service distributions, ALWAYS UNFAIR For all loads, for all service distributions, SOMETIMES UNFAIR For some loads: For other loads :

  14. Classification of Scheduling Policies FSP PLCFS PS Always FAIR FB Preemptive Size-based Policies Age- Based Policies PSJF Always Unfair FCFS LJF LRPT Non-preemptive Remaining Size-based Policies SJF Sometimes Unfair SRPT Lots of open problems…

  15. IMPLEMENT From theory to practice: What does SRPT mean within aWeb server? • Many devices: Where to do the scheduling? • No longer one job at a time.

  16. Server’s Performance Bottleneck IMPLEMENT Site buys limited fraction of ISP’s bandwidth client 1 “Get File 1” WEB SERVER client 2 (Apache) Rest of Internet “Get File 2” ISP Linux 0.S. client 3 “Get File 3” 5 We model bottleneck by limiting bandwidth on server’s uplink.

  17. Web Server Network/O.S. insides of traditional Web server IMPLEMENT Socket 1 Client1 Network Card Socket 2 Client2 BOTTLENECK Client3 Socket 3 Sockets take turns draining --- FAIR = PS.

  18. Web Server Network/O.S. insides of our improved Web server IMPLEMENT Socket 1 Client1 S Network Card 1st Socket 2 Client2 2nd M BOTTLENECK 3rd Client3 Socket 3 L priority queues. Socket corresponding to file with smallest remaining data gets to feed first.

  19. 1 1 2 2 WAN EMU 3 3 1 APACHE WEB SERVER 200 200 2 Linux Linux 3 WAN EMU switch Linux 0.S. 1 2 WAN EMU 3 200 Linux Experimental Setup Implementation SRPT-based scheduling: 1) Modifications to Linux O.S.: 6 priority Levels 2) Modifications to Apache Web server 3) Priority algorithm design.

  20. Flash Experimental Setup Apache 10Mbps uplink 1 2 WAN EMU 3 100Mbps uplink APACHE WEB SERVER 1 200 Linux 2 Surge 1 3 2 Trace-based WAN EMU 3 switch 200 Linux Open system Linux 0.S. 1 Partly-open 2 WAN EMU 3 200 WAN EMU Linux Geographically- dispersed clients Trace-based workload: Number requests made: 1,000,000 Size of file requested: 41B -- 2 MB Distribution of file sizes requested has HT property. Load < 1 Transient overload + Other effects: initial RTO; user abort/reload; persistent connections, etc.

  21. Preliminary Comments 1 2 WAN EMU 3 APACHE WEB SERVER 1 200 Linux 2 1 3 2 WAN EMU 3 switch 200 Linux Linux 0.S. 1 2 WAN EMU 3 200 Linux • Job throughput, byte throughput, and bandwidth • utilization were same under SRPT and FAIR scheduling. • Same set of requests complete. • No additional CPU overhead under SRPT scheduling. • Network was bottleneck in all experiments.

  22. Results: Mean Response Time (LAN) . . . Mean Response Time (sec) . FAIR . SRPT . Load

  23. Mean Response Time vs. Size Percentile (LAN) Load =0.8 FAIR Mean Response time (ms) SRPT Percentile of Request Size

  24. Transient Overload r>1 r>1 r>1 r<1 r<1 r>1 r>1 r>1 r<1 r<1 r<1

  25. Transient Overload - Baseline Mean response time FAIR SRPT

  26. Transient overload Response time as function of job size FAIR SRPT small jobs win big! big jobs aren’t hurt! WHY?

  27. FACTORS Baseline Case WAN propagation delays RTT: 0 – 150 ms WAN loss Loss: 0 – 15% WAN loss + delay RTT: 0 – 150 ms, Loss: 0 – 15% Persistent Connections 0 – 10 requests/conn. RTO = 0.5 sec – 3 sec Initial RTO value ON/OFF SYN Cookies Abort after 3 – 15 sec, with 2,4,6,8 retries. User Abort/Reload Packet Length Packet length = 536 – 1500 Bytes RTT = 100 ms; Loss = 5%; 5 requests/conn., RTO = 3 sec; pkt len = 1500B; User aborts After 7 sec and retries up to 3 times. Realistic Scenario

  28. Transient Overload - Realistic Mean response time FAIR SRPT

  29. Conclusion so far … • SRPT scheduling is a promising solution for reducing mean response timeseen by clients, particularly when theload at server bottleneck is high, or under transient overload conditions. • SRPT results in negligible or zerounfairnessto large requests. • SRPT iseasyto implement andefficient. No CPU overhead. No drop in throughput. • Results corroborated viaimplementationandanalysis.

  30. More questions … STATIC web requests DYNAMIC web requests Everything so far in talk … Current work… Schroeder Schroeder McWherter Wierman

  31. Online Shopping client 1 “buy” client 2 Web Server (eg: Apache/Linux) Internet “buy” client 3 Database (eg: DB2, Oracle, PostgreSQL) “buy” • Dynamic responses take much longer – 10sec • Database is bottleneck.

  32. Online Shopping client 1 “$$$buy$$$” client 2 Web Server (eg: Apache/Linux) Internet “buy” client 3 Database (eg: DB2, Oracle, PostgreSQL) “buy” Goal: Prioritize requests

  33. Isn’t “prioritizing requests” problem already solved? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database (eg: DB2, Oracle, PostgreSQL) “buy” No.Prior work mostly simulation or RTDBMS.

  34. Locks CPU(s) Disks Which resource to prioritize? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database “buy” High-Priority client Low-Priority client Internet

  35. CPU(s) Disks Q: Which resource to prioritize? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database “buy” High-Priority client Low-Priority client Internet Locks A: 2PL  Lock Queues

  36. What is bottleneck resource? Fix at 10 warehouses #clients = 10 x #warehouses • IBM DB2 -- Lock waiting time (yellow) is bottleneck. • Therefore, need to schedule lock queues to have impact.

  37. L L L L H H H H L L Why lock scheduling is hard Lock resource 1 Lock resource 2 NP H may wait long time NPinherit  Speeding up L may hurt H in long run Pabort  Rollback cost + wasted work + really hurt L’s.

  38. Results: Implementation study of NP, NPinherit, Pabort under TPC-C workload, Shore DBMS Develop new policy POW (Preempt on Wait)

  39. Results: Response time Pabort NPinherit NPinherit Pabort Think time

  40. Results: Response time Pabort NPinherit POW: Best of both NPinherit Pabort Think time

  41. More work in SYNC project… • QoS from outside the box “$$$buy$$$” QoS DBMS (eg: DB2, Oracle) Internet Web Server “buy” “buy” • Scheduling the TeraGrid PSC SDSC NCSA • Time-varying load in systems • Impact of closed versus open system models

  42. Conclusion Scheduling is a very cheap solution… No need to buy new hardware No need to buy more memory Small software modifications …with a potentially very big win in some situations. Thank you!

More Related