340 likes | 363 Views
This research delves into web server scheduling policies, emphasizing the impact on user-perceived response time and unfairness. Background, methodology, results, and implications are discussed with a focus on SRPT.
E N D
Quantifying the Properties of SRPT Scheduling Mingwei Gong and Carey Williamson Department of Computer Science University of Calgary
Outline • Introduction • Background • Web Server Scheduling Policies • Related Work • Research Methodology • Simulation Results • Defining/Refining Unfairness • Quantifying Unfairness • Summary, Conclusions, and Future Work
Introduction • Web: large-scale, client-server system • WWW: World Wide Wait! • User-perceived Web response time is composed of several components: • Transmission delay, propagation delay in network • Queueing delays at busy routers • Delays caused by TCP protocol effects (e.g., handshaking, slow start, packet loss, retxmits) • Queueing delays at the Web server itself, which may be servicing 100’s or 1000’s of concurrent requests • Our focus in this work: Web request scheduling
Example Scheduling Policies • FCFS: First Come First Serve • typical policy for single shared resource (“unfair”) • e.g., drive-thru restaurant; Sens playoff tickets • PS: Processor Sharing • time-sharing a resource amongst M jobs • each job gets 1/M of the resources (equal, “fair”) • e.g., CPU; VM; multi-tasking; Apache Web server • SRPT: Shortest Remaining Processing Time • pre-emptive version of Shortest Job First (SJF) • give resources to job that will complete quickest • e.g., ??? (express lanes in grocery store)(almost)
Related Work • Theoretical work: • SRPT is provably optimal in terms of mean response time and mean slowdown (“classical” results) • Practical work: • CMU: prototype implementation in Apache Web server. The results are consistent with theoretical work. • Concern: unfairness problem (“starvation”) • large jobs may be penalized (but not always true!)
Related Work (Cont’d) • Harchol-Balter et al. show theoretical results: • For the largest jobs, the slowdown asymptotically converges to the same value for any preemptive work-conserving scheduling policies (i.e., for these jobs, SRPT, or even LRPT, is no worse than PS) • For sufficiently large jobs, the slowdown under SRPT is only marginally worse than under PS, by at most a factor of 1 + ε, for small ε > 0. [M.Harchol-Balter, K.Sigman, and A.Wierman 2002], “Asymptotic Convergence of Scheduling Policies w.r.t. Slowdown”, Proceedings of IFIP Performance 2002, Rome, Italy, September 2002
Related Work (Cont’d) • [Wierman and Harchol-Balter 2003]: SJF FSP LAS Always Fair Always Unfair Sometimes Unfair PS FCFS PLCFS LRPT SRPT [A. Wierman and M.Harchol-Balter 2003], (Best Paper) “Classifying Scheduling Policies w.r.t. Unfairness in an M/GI/1”, Proceedings of ACM SIGMETRICS, San Diego, CA, June 2003
“asymptotic convergence” “crossover region” (mystery hump) 1 1-p x y A Pictorial View 8 PS Slowdown SRPT 1 0 8 Job Size
Research Questions • Do these properties hold in practice for empirical Web server workloads? (e.g., general arrival processes, service time distributions) • What does “sufficiently large” mean? • Is the crossover effect observable? • If so, for what range of job sizes? • Does it depend on the arrival process and the service time distribution? If so, how? • Is PS (the “gold standard”) really “fair”? • Can we do better? If so, how?
Overview of Research Methodology • Trace-driven simulation of simple Web server • Empirical Web server workload trace (1M requests from WorldCup’98) for main expts • Synthetic Web server workloads for the sensitivity study experiments • Probe-based sampling methodology • Estimate job response time distributions for different job size, load level, scheduling policy • Graphical comparisons of results • Statistical tests of results (t-test, F-test)
Simulation Assumptions • User requests are for static Web content • Server knows response size in advance • Network bandwidth is the bottleneck • All clients are in the same LAN environment • Ignores variations in network bandwidth and propagation delay • Fluid flow approximation: service time = response size • Ignores packetization issues • Ignores TCP protocol effects • Ignores network effects • (These are consistent with SRPT literature)
Performance Metrics • Number of jobs in the system • Number of bytes in the system • Normalized slowdown: • The slowdown of a job is its observed response time divided by the ideal response time if it were the only job in the system • Ranges between 1 and • Lower is better
... 3 2 Number of Jobs in the System 1 0.000315 0.001048 ... 5000 4000 Number of Bytes in the System 3000 0.000315 0.001048 Time Preliminaries: An Example Jobs in System Bytes in System
The “byte backlog” is the same for each scheduling policy The busy periods are the same for each policy. Observations: • The distribution of the number of jobs in the system is different
General Observations (Empirical trace) Load 50% Load 80% Load 95% Marginal Distribution (Num Jobs in System) for PS and SRPT: differences are more pronounced at higher loads
Objectives (Restated) • Compare PS policy with SRPT policy • Confirm theoretical results in previous work (Harchol-Balter et al.) • For the largest jobs • For sufficiently large jobs • Quantify unfairness properties
PS PS PS Probe-Based Sampling Algorithm • The algorithm is based on PASTA (Poisson Arrival See Time Average) Principle. Slowdown (1 sample) Repeat N times
Probe-based Sampling Algorithm For scheduling policy S =(PS, SRPT, FCFS, LRPT, …) do For load level U = (0.50, 0.80, 0.95) do For probe job size J = (1B, 1KB, 10KB, 1MB...) do For trial I= (1,2,3… N) do Insert probe job at randomly chosen point; Simulate Web server scheduling policy; Compute and record slowdown value observed; end of I; Plot marginal distribution of slowdown results; end of J; end of U; end of S;
Load 50% Load 80% Load 95% Example Results for 3 KB Probe Job
Load 50% Load 80% Load 95% Size 100K Example Results for 100 KB Probe Job
Load 50% Load 80% Load 95% Example Results for 10 MB Probe Job
Two Aspects of Unfairness • Endogenous unfairness: (SRPT) • Caused by an intrinsic property of a job, such as its size. This aspect of unfairness is invariant • Exogenous unfairness: (PS) • Caused by external conditions, such as the number of other jobs in the system, their sizes, and their arrival times. • Analogy: showing up at a restaurant without a reservation, wanting a table for k people
PS is “fair” Sort of! Exogenous unfairness dominant Observations for PS
Endogenous unfairness dominant Observations for SRPT
Linear Scale Log Scale Illustrating the crossover effect (load=95%) 3M 3.5M 4M
Crossover Effect? Yes!
Summary and Conclusions • Trace-driven simulation of Web server scheduling strategies, using a probe-based sampling methodology (probe jobs) to estimate response time (slowdown) distributions • Confirms asymptotic convergence of the slowdown metric for the largest jobs • Confirms the existence of the “cross-over effect” for some job sizes under SRPT • Provides new insights into SRPT and PS • Two types of unfairness: endogenous vs. exogenous • PS is not really a “gold standard” for fairness!
Ongoing Work • Synthetic Web workloads • Sensitivity to arrival process (self-similar traffic) • Sensitivity to heavy-tailed job size distributions • Evaluate novel scheduling policies that may improve upon PS (e.g., FSP, k-SRPT, …)
Sensitivity to Arrival Process • A bursty arrival process (e.g., self-similar traffic, with Hurst parameter H > 0.5) makes things worse for both PS and SRPT policies • A bursty arrival process has greater impact on the performance of PS than on SRPT • PS exhibits higher exogenous unfairness than SRPT for all Hurst parameters and system loads tested
Sensitivity to Job Size Distribution • SRPT loves heavy-tailed distributions: the heavier the tail the better! • For all Pareto parameter values and all system loads considered, SRPT provides better performance than PS with respect to mean slowdown and standard deviation of slowdown • At high system load (U = 0.95), SRPT has more pronounced endogenous unfairness than PS
Thank You!Questions? For more information: M. Gong and C. Williamson, “Quantifying the Properties of SRPT Scheduling”, to appear, Proceedings of IEEE MASCOTS, Orlando, FL, October 2003 Email: {gongm,carey}@cpsc.ucalgary.ca