THE RESEARCH PROCESS

OPEN VERSUS CLOSED:A CAUTIONARY TALEBianca Schroeder Adam Wierman Mor Harchol-BalterComputer Science DepartmentCarnegie Mellon UniversityTo appear at NSDI 2006presenter:吳泰廷

new system has smaller response time! old new system new This comparison requires testing the two systems on realistic workloads THE RESEARCH PROCESS standard system

INTRODUTION • Need system models that “accurately represent" the real system. • Representing a system accurately involves many things: • bottleneck resource behavior, the scheduling of requests • at that bottleneck, workload parameters such as the distribution • of service request demands……. • One factor that researchers typically pay little attention to • is whether the job arrivals obey a closed or an open system • model.

We show that closed and open system models yield significantly • different results, even when both models are run with the same • load and service demands. • Conclude with guidelines for choosing a system model.

think send receive server MANY WAYS TO GENERATE REALISTIC WORKLOADS User requests web page, receives page, reads page, clicks on new link N=MPL (multiprogramming level) CLOSED SYSTEM MODEL

Trace driven 1:01.12 ip1 GET a.gif HTTP/1.0 1:01.20 ip2 GET b.htm HTTP/1.0 1:01.25 ip1 GET c.jpg HTTP/1.0 1:01.27 ip1 GET d.txt HTTP/1.0 1:01.28 ip3 GET a.htm HTTP/1.0 1:01.35 ip4 GET d.gif HTTP/1.0 1:01.45 ip2 GET e.htm HTTP/1.0 : : arrival times service demands x x x new arrivals server MANY WAYS TO GENERATE REALISTIC WORKLOADS next arrival time from trace file sizes from trace OPEN SYSTEM MODEL

x x x new arrivals server MANY WAYS TO GENERATE REALISTIC WORKLOADS Distribution driven Use distributions of interarrival times and service demands (typically using trace info) interarrival time dist. service demand dist. sample dist. sample dist. OPEN SYSTEM MODEL

OPEN MODEL CLOSED MODEL Arrivals are independent of completions Arrivals are completely dependent on departures There is a fixed population of users, called the Multi-Programming-Level (MPL) There is no max number of simultaneous users

OPEN MODEL WEB WORKLOAD GENERATORS CLOSED MODEL Do you use an open or closed model? Surge • Workload generators for thesame purpose use differentsystem models! • It’s often not clear which model workload generatorsuse! SPECWeb TPC-W Sclient RUBiS WebBench Webjamma

NEITHER THE OPEN OR CLOSEDMODEL IS COMPLETELY REALISTIC

PARTLY-OPEN MODEL with probability q return to the system think send receive x x x leave system new arrivals server PARTLY-OPEN SYSTEM

OUR GOAL What is the impact of the choice of an open or closed model?

OPEN CLOSED HOW DO WE COMPARE OPEN AND CLOSED SYSTEMS? • Fix the service distribution acrossthe systems • Fix the load across the systems adjust load using the arrival rate load depends only on mean arrival rate and mean service demands load depends on MPL, think times, mean of service demands, variability of service demands … adjust load using the think time

How do open and closed response times compare? FCFS scheduling open  Poisson arrival process closed  Exponential think times

1000 100 10 mean response time 0 0.25 0.5 0.75 1 load FCFS scheduling open  Poisson arrival process closed  Exponential think times Open CLOSED <<OPEN Closed (MPL=10)

1000 100 10 mean response time 0 0.25 0.5 0.75 1 load FCFS scheduling open  Poisson arrival process closed  Exponential think times Open CLOSED OPEN Closed (MPL=1000) Closed (MPL=100) Closed (MPL=10)

OPEN MODEL CLOSED MODEL VS CLOSED  OPEN AS MPL GROWS As MPL grows arrival rate becomes independent of completion rate

1500 1000 500 mean response time low variability high variability How quickly does Closed  Open? Open Web Workloads Closed (MPL=1000) Closed (MPL=100) Closed (MPL=10)

There principles 1.For a given load, mean response times are significantly lower in closed systems than in open systems. 2. As theMPL grows, closed systems become open, but convergence is slow for practical purposes. 3.While variability has a large effect in open systems, the effect is much smaller in closed systems.

OUR GOAL What is the impact of the choice of an open or closed model? • What is the impacton the effectivenessof scheduling? • What is the impactin practice? It matters a lot!

FCFS (First-Come-First-Served): Jobs are processed in the same order as they arrive. • PS (Processor-Sharing) The server is shared evenly among all jobs in the system. • PESJF (Preemptive-Expected-Shortest-Job-First) The job with the smallest expected duration (size) is given preemptive priority. • SRPT (Shortest-Remaining-Processing-Time-First): At every moment the request with the smallest remaining processing requirement is given priority.

Improved design Shortest Remaining Processing Time (SRPT) Standard design Processor Sharing (PS) Compare using a workload generator SCHEDULING IS A KEY COMPONENT OF SYSTEM DESIGN WEB SERVERS Does the effectiveness of scheduling depend on the system model (open vs. closed)?

PLJF FCFS PS SRPT SCHEDULING IN OPEN SYSTEMS OPEN 1000 600 300 0 How do the closed results compare? mean response time 0 .25 .5 .75 1 load

PLJF FCFS PS SRPT PLJF FCFS PS SRPT • Limited impact of variability in closed system • Bounded number of jobs in closed system • Dependencies between completions and arrivalsin closed system reduces burstiness Why? CONTRASTING THE IMPACT OF SCHEDULING OPEN CLOSED 1000 600 300 0 mean response time 0 .25 .5 .75 1 0 .25 .5 .75 1 load load

Three priciples • While open systems benefit significantly from scheduling with respect to response time, closed systems improve much less. 2. Scheduling only significantly improves response time in closed systems under very specific parameter settings: moderate load (think times) and highMPL. 3. Scheduling can limit the effect of variability in both open and closed systems.

OUR GOAL What is the impact of the choice of an open or closed model? It matters a lot! Especially when evaluating scheduling policies What is the impact in practice?

OPEN VS CLOSEDIN PRACTICE 4 CASE STUDIES • Serving static web content • Database backend ofan e-commerce site 3. Auctioning web site testbed implementation trace-based simulation

PS PS SRPT SRPT OPEN VS CLOSEDIN PRACTICE STATIC WEB SERVER OPEN CLOSED 300 200 100 MPL=50 mean response time 0 .25 .5 .75 1 0 .25 .5 .75 1 load load Different models give different conclusion about benefits of SRPT

OPEN CLOSED 10 8 4 0 MPL=50 PS E-COMMERCE SITE PS PESJF PESJF mean response time 20 14 7 0 load load MPL=50 PS AUCTION SITE PS SRPT SRPT 0 .25 .5 .75 1 0 .25 .5 .75 1 load load

How can we identify whether to use an open or closed model? OUR GOAL TODAY What is the impact of the choice of an open or closed model? It matters a lot in practice! Especially when evaluating scheduling policies

PARTLY-OPEN MODEL with probability q return to the system think send receive x x x leave system new arrivals server A MORE REALISTIC ALTERNATIVE What parameters affect the load? Does think time affect the load? How do think times affect response times?

Trace 12 ip1 GET a.gif HTTP/1.0 20 ip2 GET b.htm HTTP/1.0 25 ip1 GET c.jpg HTTP/1.0 27 ip1 GET d.txt HTTP/1.0 28 ip3 GET a.htm HTTP/1.0 35 ip4 GET d.gif HTTP/1.0 45 ip2 GET e.htm HTTP/1.0 : : PARTLY-OPEN service demands FITTING A PARTLY-OPEN MODEL file sizes from trace

Trace 12 ip1 GET a.gif HTTP/1.0 20 ip2 GET b.htm HTTP/1.0 25 ip1 GET c.jpg HTTP/1.0 27 ip1 GET d.txt HTTP/1.0 28 ip3 GET a.htm HTTP/1.0 35 ip4 GET d.gif HTTP/1.0 45 ip2 GET e.htm HTTP/1.0 : : PARTLY-OPEN FITTING A PARTLY-OPEN MODEL Fitting the interarrival times • Distinguish userse.g. use ip address in a web trace • Identify user session boundaries  Use periods of inactivity of length > timeout

2e5 1e5 0 financial Number of sessions world cup dept store 0 30min Timeout length CHOOSING A TIMEOUT VALUE

PS SRPT THE EFFECT OFTHINK TIME STATIC WEB SERVER 300 200 100 0 mean response time 1 10 100 1000 mean think time

PARTLY-OPEN MODEL with probability q return to the system think send receive q0 q1 x x x ? ? OPEN CLOSED leave system new arrivals server A MORE REALISTIC ALTERNATIVE Workload generators are only Open/Closed! number of requests per visit ↓ number of requests per visit ↑

PS open PS SRPT PS closed THE TRANSITION FROM OPEN  CLOSED STATIC WEB SERVER CLOSED 300 200 100 0 OPEN mean response time 0 5 10 15 20 mean number of requests per visit

STATIC WEB E-COMMERCE SITE 200 100 0 9 6 3 0 PS PS SRPT PESJF 15 10 5 0 AUCTIONING PS SRPT 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 THE PARTLY-OPEN SYSTEM IN PRACTICE mean response time mean number of requests per visit

PS SRPT PS PARTLY-OPEN SRPT PS SRPT THESE DIFFERENCES ARE IMPORTANT IN PRACTICE OPEN CLOSED VS

CHOOSING A SYSTEM MODEL Web workloads 1. Large corporate web 2. CMU web server 3. Online department store 4. Science institute (USGS) 5. Online gaming site 6. Financial service provider 7. Supercomputing web site 8. Kasparov-DeepBlue match 9. Site seeing “slashdot effect” 10. Soccer world cup Open or closed? Use a partly-open model...

CHOOSING A SYSTEM MODEL Web workloads 1. Large corporate web 2. CMU web server 3. Online department store 4. Science institute (USGS) 5. Online gaming site 6. Financial service provider 7. Supercomputing web site 8. Kasparov-DeepBlue match 9. Site seeing “slashdot effect” 10. Soccer world cup Open or closed? Use a partly-open model... ...to decide which is more accurate

What is the expected num. of visits? Fit a partly open model to the trace else <5 5-10 >10 OPEN ??? CLOSED world cup 15 10 5 0 >>1000 dept store Mean num. of visits financial OPEN ≈ CLOSED 0 30min Timeout length HOW TO CHOOSE A SYSTEM MODEL How many simult. users are there? Gather a trace

CHOOSING A SYSTEM MODEL <5 expected visits Web Workloads OPEN 1. Large corporate web 2. CMU web server 3. Online department store 4. Science institute (USGS) 5. Online gaming site 6. Financial service provider 7. Supercomputing web site 8. Kasparov-DeepBlue match 9. Site seeing “slashdot effect” 10. Soccer world cup 5-10 expected visits PARTLY OPEN >10 expected visits CLOSED

CHOOSING A SYSTEM MODEL <5 expected visits 1. Large corporate web 2. CMU web server 4. Science institute (USGS) 6. Financial service provider 8. Kasparov-DeepBlue match 9. Site seeing “slashdot effect” Web Workloads OPEN 5-10 expected visits 3. Online department store 7. Supercomputing web site PARTLY OPEN >10 expected visits 5. Online gaming site 10. Soccer world cup CLOSED

CONCLUSION • The differences in behavior of closed, open,and partly-open systems. • These principles underscore the importance of choosingthe appropriate system model. • Our findings provide guidelines for choosingwhether an open or closed model is the better approximationbased oncharacteristics of the workload. • Understandingthe appropriate system model is essential to understanding the impact of scheduling.

THE RESEARCH PROCESS

THE RESEARCH PROCESS

Presentation Transcript

The Research Process

The Research Process

The Research Process

The Research Process

The Research Process

The RESEARCH Process

The Research Process

The Research Process

The Research Process

The Research Process

The Research Process

the Research Process

The Research Process

THE RESEARCH PROCESS

The Research Process

The Research Process

The Research Process

The research process

The Research Process