Mor Harchol-Balter Carnegie Mellon University School of Computer Science

Scheduling Your Network Connections Mor Harchol-Balter Carnegie Mellon University School of Computer Science

FCFS jobs jobs PS SRPT jobs Q: Which minimizes mean response time? “size” = service requirement load r < 1

Q: Which best represents scheduling in web servers ? FCFS jobs “size” = service requirement load r < 1 jobs PS SRPT jobs

IDEA How about using SRPT instead of PS in web servers? client 1 “Get File 1” WEB SERVER (Apache) client 2 Internet “Get File 2” Linux 0.S. client 3 “Get File 3”

Immediate Objections 1) Can’t assume known job size Many servers receive mostly static web requests.“GET FILE”For static web requests, know file size Approx. know service requirement of request. 2) But the big jobs will starve ...

Wierman Schroeder Outline of Talk THEORY (M/G/1) [Sigmetrics 01] “Analysis of SRPT: Investigating Unfairness” [Performance 02] “Asymptotic Convergence of Scheduling Policies…” [Sigmetrics 03*] “Classifying Scheduling Policies wrt Unfairness …” IMPLEMENT [TOCS 03] “Size-based Scheduling to Improve Web Performance” [ITC 03*, TOIT 06] “Web servers under overload: How scheduling helps” [ICDE 04,05,06] “Priority Mechanisms for OLTP and Web Apps” IBM/CMU Patent www.cs.cmu.edu/~harchol/

SRPT has a long history ... THEORY 1966 Schrage & Miller derive M/G/1/SRPT response time: 1968 Schrage proves optimality 1979 Pechinkin & Solovyev & Yashkov generalize 1990 Schassberger derives distribution on queue length BUT WHAT DOES IT ALL MEAN?

THEORY SRPT has a long history (cont.) 1990 - 97 7-year long study at Univ. of Aachen under Schreiber SRPT WINS BIG ON MEAN! 1998, 1999 Slowdown for SRPT under adversary: Rajmohan, Gehrke, Muthukrishnan, Rajaraman, Shaheen, Bender, Chakrabarti, etc. SRPT STARVES BIG JOBS! Various o.s. books: Silberschatz, Stallings, Tannenbaum: Warn about starvation of big jobs ... Kleinrock’s Conservation Law: “Preferential treatment given to one class of customers is afforded at the expense of other customers.”

THEORY ? PS ? SRPT Unfairness Question Let r=0.9. Let G: Bounded Pareto(a = 1.1, max=1010) Question: Which queue does biggest job prefer? M/G/1 M/G/1

PS SRPT I SRPT Results on Unfairness Let r=0.9. Let G: Bounded Pareto(a = 1.1, max=1010)

Unfairness – General Distribution All-can-win-theorem: For all distributions, if r< ½, E[T(x)]SRPT< E[T(x)]PSfor all x.

x ò 2 + x F ( x ) 2 t f ( t ) dt l l x dt ò 0 - r ( - r ( 2 2 ( 1 x )) 1 t ) 0 All-can-win-theorem: For all distributions, if r< ½, E[T(x)]SRPTE[T(x)]PSfor all x. £ Proof idea: + Waiting time (SRPT) Residence (SRPT) Total (PS)

FSP PLCFS PS Always Fair FB Preemptive Size-based Policies Age- Based Policies PSJF Always Unfair FCFS LJF LRPT Non-preemptive Remaining Size-based Policies SJF Sometimes Unfair SRPT Classification of Scheduling Policies [Sigmetrics 01, 03] • [Sigmetrics 04] • Henderson FSP (Cornell) • (both FAIR & efficient) • Levy’s RAQFM (Tel Aviv) • (size + temporal fairness) • Biersack’s, Bonald’s • flow fairness (France) • Nunez, Borst • TCP/DPS fairness • (EURANDOM)

IMPLEMENT From theory to practice: What does SRPT mean within aWeb server? • Many devices: Where to do the scheduling? • No longer one job at a time.

Server’s Performance Bottleneck IMPLEMENT Site buys limited fraction of ISP’s bandwidth client 1 “Get File 1” WEB SERVER client 2 (Apache) Rest of Internet “Get File 2” ISP Linux 0.S. client 3 “Get File 3” 5 We model bottleneck by limiting bandwidth on server’s uplink.

Web Server Network/O.S. insides of traditional Web server IMPLEMENT Socket 1 Client1 Network Card Socket 2 Client2 BOTTLENECK Client3 Socket 3 Sockets take turns draining --- FAIR = PS.

Web Server Network/O.S. insides of our improved Web server IMPLEMENT Socket 1 Client1 S Network Card 1st Socket 2 Client2 2nd M BOTTLENECK 3rd Client3 Socket 3 L priority queues. Socket corresponding to file with smallest remaining data gets to feed first.

1 2 3 200 Linux Experimental Setup 1 2 WAN EMU 3 1 200 APACHE WEB SERVER Linux 2 1 3 2 WAN EMU 3 switch 200 Linux Linux 0.S. WAN EMU Implementation SRPT-based scheduling: 1) Modifications to Linux O.S.: 6 priority Levels 2) Modifications to Apache Web server 3) Priority algorithm design.

Flash Apache 10Mbps uplink 100Mbps uplink Surge Trace-based Open system Partly-open WAN EMU Geographically- dispersed clients Load < 1 Transient overload Experimental Setup 1 2 WAN EMU 3 APACHE WEB SERVER 1 200 Linux 2 1 3 2 WAN EMU 3 switch 200 Linux Linux 0.S. 1 2 WAN EMU 3 200 Linux Trace-based workload: Number requests made: 1,000,000 Size of file requested: 41B -- 2 MB Distribution of file sizes requested has HT property. + Other effects: initial RTO; user abort/reload; persistent connections, etc.

Preliminary Comments 1 2 WAN EMU 3 APACHE WEB SERVER 1 200 Linux 2 1 3 2 WAN EMU 3 switch 200 Linux Linux 0.S. 1 2 WAN EMU 3 200 Linux • Job throughput, byte throughput, and bandwidth • utilization were same under SRPT and FAIR scheduling. • Same set of requests complete. • No additional CPU overhead under SRPT scheduling. • Network was bottleneck in all experiments.

Results: Mean Response Time (LAN) . . . Mean Response Time (sec) . FAIR . SRPT . Load

Mean Response Time vs. Size Percentile (LAN) Load =0.8 FAIR Mean Response time (ms) SRPT Percentile of Request Size

Transient Overload r>1 r>1 r>1 r<1 r<1 r>1 r>1 r>1 r<1 r<1 r<1

Transient Overload - Baseline Mean response time FAIR SRPT

Transient overload Response time as function of job size FAIR SRPT small jobs win big! big jobs aren’t hurt! WHY?

FACTORS Baseline Case WAN propagation delays RTT: 0 – 150 ms WAN loss Loss: 0 – 15% WAN loss + delay RTT: 0 – 150 ms, Loss: 0 – 15% Persistent Connections 0 – 10 requests/conn. RTO = 0.5 sec – 3 sec Initial RTO value ON/OFF SYN Cookies Abort after 3 – 15 sec, with 2,4,6,8 retries. User Abort/Reload Packet Length Packet length = 536 – 1500 Bytes RTT = 100 ms; Loss = 5%; 5 requests/conn., RTO = 3 sec; pkt len = 1500B; User aborts After 7 sec and retries up to 3 times. Realistic Scenario

Transient Overload - Realistic Mean response time FAIR SRPT

More questions … STATIC web requests DYNAMIC web requests Everything so far in talk … Current work… (ICDE 04,05,06) Schroeder Schroeder McWherter Wierman

Online Shopping client 1 “buy” client 2 Web Server (eg: Apache/Linux) Internet “buy” client 3 Database (eg: DB2, Oracle, PostgreSQL) “buy” • Dynamic responses take much longer – 10sec • Database is bottleneck.

Online Shopping client 1 “$$$buy$$$” client 2 Web Server (eg: Apache/Linux) Internet “buy” client 3 Database (eg: DB2, Oracle, PostgreSQL) “buy” Goal: Prioritize requests

Isn’t “prioritizing requests” problem already solved? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database (eg: DB2, Oracle, PostgreSQL) “buy” No. Prior work is simulation or RTDBMS.

Locks CPU(s) Disks Which resource to prioritize? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database “buy” High-Priority client Low-Priority client Internet

CPU(s) Disks Q: Which resource to prioritize? “$$$buy$$$” Web Server (eg: Apache/Linux) Internet “buy” Database “buy” High-Priority client Low-Priority client Internet Locks A: 2PL  Lock Queues

What is bottleneck resource? Fix at 10 warehouses #clients = 10 x #warehouses • IBM DB2 -- Lock waiting time (yellow) is bottleneck. • Therefore, need to schedule lock queues to have impact.

L L L L H H H H L L Existing Lock scheduling policies Lock resource 1 Lock resource 2 NP Non-preemptive. Can’t kick out lock holder. NPinheritNP + Inheritance. Pabort Preemptively abort. But suffer rollback cost + wasted work.

Results: Non-preemptive policies Preemptive-abort policy Response Time (sec) Response Time (sec) Low Low High High Think time Think time New idea: POW (Preempt-on-Wait) Preempt selectively: only preempt those waiting.

Response Time (sec) Pabort NPinherit NPinherit Pabort Think time (sec) Results: POW: Best of both IBM/CMU patent

“$$$buy$$$” QoS DBMS (eg: DB2, Oracle) Web Server Internet “buy” “buy” H L L L Scheduling External DBMS scheduling

Conclusion Scheduling is a very cheap solution… No need to buy new hardware No need to buy more memory Small software modifications …with a potentially very big win. Thank you!

Mor Harchol-Balter Carnegie Mellon University School of Computer Science