Advanced Systems Lab

Advanced Systems Lab G. Alonso, D. Kossmann Systems Group http://www.systems.ethz.ch

Reading • Read Chapter 4, 5, and 6 in the text book

Throughput and Response Time

Understanding Performance • Response Time • criticalpathanalysis in a taskdependencygraph • „partition“ expensivetasksintosmallertasks • Throughput • queueingnetwork model analysis • „replicate“ resources at bottleneck

Response Times msecs #servers

Whyareresponsetimeslong? • Becauseoperationstakelong • cannottravelfasterthan light • delayseven in „single-user“ mode • possibly, „parallelize“ long-runningoperations • „intra-requestparallelism“ • Becausethereis a bottleneck • contention of concurrentrequests on a resource • requestswait in queuebeforeresourceavailable • addresources to parallelizerequests at bottleneck • „inter-requestparallelism“

CriticalPath • Directed Graph of Tasks and Dependencies • response time = max { length of path } • assumptions: no resourcecontention, no pipelining, ... • Whichtaskswouldyoutry to optimizehere? Start A (3ms) C (9ms) B (1ms) End

QueueingNetwork Models • Graph of resources and flow of requests • Bottleneck=Resourcedefinestput of wholesystem • (analysistechniquesdescribedlater in thecourse) Start Server A (3 req/ms) Server C (20 req/ms) Server B (5 req/ms) End

Forms of Parallelism • Inter-requestParallelism • severalrequestshandled at thesame time • principle: replicateresources • e.g., ATMs • (Independent) Intra-requestParallelism • principle: divide & conquer • e.g., printpieces of document on severalprinters • Pipelining • each „item“ isprocessedbyseveralresources • process „items“ at different resources in parallel • canlead to both inter- & intra-requestparallelism

Inter-requestParallelism Req 1 Resp. 1 Resp. 2 Resp. 3

Intra-requestParallelism Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1

Pipelining (Intra-request) Req 1 split Req 1.1 merge Response 1

Speed-up • Metricforintra-requestparallelization • Goal: test ability of SUT to reduceresponse time • measureresponse time with 1 resource • measureresponse time with N resources • SpeedUp(N) = RT(1) / RT(N) • Ideal • SpeedUp(N) is a linear function • canyouimagine super-linear speed-ups?

Speed Up #servers

Scale-up • Test how SUT scaleswithsize of theproblem • measureresponse time with 1 server, unitproblem • measureresponse time with N servers, N unitsproblem • ScaleUp(N) = RT(1) / RT(N) • Ideal • ScaleUp(N) is a constantfunction • Canyouimagine super scale-up?

Scale Up Exp.: Response Time msecs #servers

Scale Out • Test how SUT behaveswithincreasingload • measurethroughput: 1 server, 1 user • measurethroughput: N servers, N users • ScaleOut(N) = Tput(1) / Tput(N) • Ideal • Scale-OutshouldbehavelikeScale-Up • (oftentermsareusedinterchangeably; butworth-while to noticethedifferences) • Scale-out and down in Cloud Computing • theability of a system to adapt to changes in load • oftenmeasured in $ (or at least involvingcost)

Whyisspeed-up sub-linear? Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1

Whyisspeed-up sub-linear? • Costfor „split“ and „merge“ operation • thosecanbeexpensiveoperations • try to parallelizethem, too • Interference: serversneed to synchronize • e.g., CPUs accessdatafromsamedisk at same time • shared-nothingarchitecture • Skew: worknot „split“ intoequal-sizedchunks • e.g., somepiecesmuchbiggerthanothers • keepstatistics and plan better

Summary • Improve Response Times by „partitioning“ • divide & conquerapproach • Works well in manysystems • ImproveThroughputbyrelaxing „bottleneck“ • addresources at bottleneck • Fundamental limitations to scalability • resourcecontention (e.g., lock conflicts in DB) • skew and poorloadbalancing • Special kinds of experimentsforscalability • speed-up and scale-upexperiments

Metrics and workloads

Metrics and Workloads • Definingmoreterms • Workload • Parameters • ... • ExampleBenchmarks • TPC-H, etc. • Learnmoremetrics and traps

Ingredients of an Experiment (rev.) • System(s) Under Test • The (real) systemswewouldlike to explore • Workload(s) = User model • Typicalbehavior of users / clients of thesystem • Parameters • The „itdepends“ part of theanswer to a perf. question • System parameters vs. Workloadparameters • Test database(s) • For databaseworkloads • Metrics • Definingwhat „better“ means: speed, cost, availability, ...

System under Test • Characterizedbyits API (services) • set of functionswithparameters and resulttypes • Characterizedby a set of parameters • Hardware characteristics • E.g., networkbandwidth, number of cores, ... • Software characteristics • E.g., consistencylevelfor a databasesystem • Observableoutcomes • Droppedrequests, latency, systemutilization, ... • (results of requests / API calls)

Workload • A sequence of requests (i.e., API/servicecalls) • Includingparametersettings of calls • Possibly, correlationbetweenrequests (e.g., sessions) • Possibly, requestsfrom different geographiclocations • Workloadgenerators • Simulate a clientwhichissues a sequence of requests • Specify a „thinking time“ orarrival rate of requests • Specify a distributionforparametersettings of requests • Open vs. Closed System • Number of „active“ requestsis a constantorbound • Closedsystem = fixed #clients, eachclient 0,1 pendingreq. • Warning: Oftenmodel a closedsystemwithoutknowing!

Closed system • Load comes from a limited set of clients • Clients wait for response before sending next request • Load is self-adjusting • System tends to stability • Example: database with local clients

Open system • Load comes from a potentially unlimited set of clients • Load is not limited by clients waiting • Load is not self-adjusting (load keeps coming even if SUT stops) • Tests system’s stability • Example: web server

Parameters • Manysystem and workloadparameters • E.g., size of cache, locality of requests, ... • Challenge is to find the ones that matter • understandingthesystem + commonsense • Computethestandarddeviation of metric(s) whenvarying a parameter • iflow, theparameterisnotsignificant • if high, theparameterissignificant • importantareparameterswhichgenerate „cross-overpoints“ between System A and B whenvaried. • Carefulaboutcorrelations: varycombinations of params

Test Database • Manysystemsinvolve „state“ • Behaviordepends on state of database • E.g., longresponsetimesforbigdatabases • Database is a „workloadparameter“ • Butverycomplex • And withcompleximplications • Criticaldecisions • Distribution of values in thedatabase • Size of database (performancewhengenerating DB) • Ref.: J. Gray et al.: SIGMOD 1994.

PopularDistributions • Uniform • Choose a range of values • Eachvalue of rangeischosenwiththesame prob. • Zipf (self-similarity) • Frequency of valueisinverse proportional to rank • F(V[1]) ~ 2 x F(V[2]) ~ 4x F(V[4]) ... • Skewcanbecontrolledby a parameterz • Default: z=1; uniform: z=0; high zcorresponds to high skew • Independent vs. Correlations • In reality, thevalues of 2 (ormore) dim. arecorrelated • E.g., peoplewhoare good in mathare good in physics • E.g., a carwhichis good in speedis bad in price

Independent Correlated Anti-correlated Multi-dimensional Distributions Ref.: Börszönyi et al.: „TheSkyline Operator“, ICDE 2001.

Metrics • Performance; e.g., • Throughput (successfulrequests per second) • Bandwidth (bits per second) • Latency / Response Time • Cost; e.g., • Cost per request • Investment • Fix cost • Availability; e.g., • Yearlydowntime of a singleclient vs. wholesystem • % droppedrequests (orpackets)

Metrics • How to aggregatemillions of measurements • classic: median + standarddeviation • Whyis median betterthanaverage? • Whyisstandarddeviation so important? • Percentiles (quantiles) • V = Xthpercentileif X% of measurmentsare < V • Max ~ 100th percentile; Min ~ 0th percentile • Median ~ 50th percentile • Percentiles good fit for Service Level Agreements • Mode: Most frequent (probable) value • Whenisthe mode the best metric? (Give an example)

PercentileExample

AmazonExample (~2004) • Amazon lost about 1% of shoppingbaskets • Acceptablebecauseincrementalcost of IT infrastructure to secure all shoppingbasketsmuchhigherthan 1% of therevenue • Someday, somebodydiscoveredthatthey lost the *largest* 1% of theshoppingbaskets • Not okay becausethosearethepremiumcustomers and theynever come back • Result in muchmorethan 1% of therevenue • Be carefulwithcorrelationswithinresults!!!

Wheredoes all this come from? • Real workloads • Usetracesfromexisting (production) system • Use real databasesfromproductionsystem • Syntheticworkloads • Usestandardbenchmark • Inventsomethingyourself • Tradeoffs • Real workloadisalways relevant • Syntheticworkload good to study „cornercases“ • Makesitpossible to vary all „workloadparameters“ • Ifpossible, useboth!

Benchmarks • Specifythewholeexperimentexcept SUT • Sometimesspecifysettings of „systemparameters“ • E.g., configure DBMS to run at isolationlevel 3 • Designedfor „is System A betterthan B“ questions • Report oneortwonumbers as metricsonly • Usecomplexformula to computethesenumbers • Zero oroneworkloadparametersonly • Standardization and notaries to publishresults • Misusedbyresearch and industry • Implementonly a subset • Inventnewmetrics and workloadparameters • Violation of „system parameter“ settings and fineprint

Benchmarks: Good, bad, and ugly • Good • Helpdefine a field: giveengineers a goal • Great formarketing and salespeople • Even ifmisused, greattoolforresearch and teaching • Bad • Benchmarkwarsarenotproductive • Misleadingresults – hugedamageif irrelevant • Ugly • Expensive to becompliant (legal fineprint) • Irreproducibleresultsdue to complexconfigurations • Vendorshavecomplexlicenseagreements (DeWittclause) • Single numberresultfavors „elephants“ • Difficult to demonstrateadvantages in the „niche“

Benchmarks • Conjecture „Benchmarks are a series of tests in order to obtain prearranged results not available on competitive systems.“ (S. Kelly-Bootle) • Corollary „I only trust statistics that I have invented myself.“(folklore)

ExampleBenchmarks • CPU • E.g., „g++“, Ackermann, SPECint • Databases (www.tpc.org) • E.g., TPC-C, TPC-E, TPC-H, TPC-W, ... • Parallel Systems • NAS Parallel Benchmark, Splash-2 • Other • E.g., CloudStone, LinearRoad • Microbenchmarks • E.g., LMBench

SPECint • Goal: Study CPU speed of different hardware • SPEC = Standard Performance Eval. Corp. • www.spec.org • Long history of CPU benchmarks • First version CPU92 • Currentversion: SPECint2006 • SPECint2006 involves 12 tests (all in C/C++) • perlbench, gcc, bzip2, ..., xalancbmk • Metrics • Comparerunning time to „referencemachine“ • E.g., 2000secs vs. 8000secs forgccgivesscore of 4 • Overall score = geometricmean of all 12 scores

SPECintResults • Visit: http://www.spec.org/cpu2006/results/cint2006.html

TPC-H Benchmark • Goal: Evaluate DBMS + hardwarefor OLAP • Find the „fastest“ systemfor a given DB size • Find the best „speed / $“ systemfor a givensize • See to which DB sizesthesystemsscale • TPC-H models a company • Orders, Lineitems, Customers, Products, Regions, ... • TPC-H specifiesthefollowingcomponents • Dbgen: DB generatorwith different scalingfactors • Scalingfactor of DB istheonlyworkloadparameter • Mix of 22 queries and 2 update functions • Executioninstructions and metrics

TPC-H Fineprint • Physical Design • E.g., youmustnotverticallypartition DB • (manyresultsviolatethat, i.e., all columnstores) • Executionrules • Specifyexactlyhow to executequeries and updates • Specifiyexactlywhich SQL variantsareallowed • Results • Specifiesexactlyhow to computemetrics and how to publishresults • Thespecificationisabout 150 pageslong (!)

TPC-H Results • Visit: http://www.tpc.org/tpch/results/tpch_price_perf_results.asp

Microbenchmarks • Goal: „Understandfullbehavior of a system“ • Not good fordecision „System A vs. System B“ • Good forcomponenttests and unittests • Design Principles • Manysmall and simple experiments, manyworkloadparameters • Report all results (ratherthanonebignumber) • Eachexperimenttests a different feature (service) • E.g., tablescan, indexscan, joinfor DB • E.g., specificfunctioncalls, representativeparametersettings • Isolatethisfeature as much as possible • Design requiresknowledge of internals of SUT • Designedfor a specificstudy, benchmarknotreusable

How to improveperformance? • Find bottleneck • Throw additional resources at the bottleneck • Find the new bottleneck • Throw additional resources at the bottleneck • ...

Advanced Systems Lab