370 likes | 472 Views
CSC407: Software Architecture Summer 2006 Performance. Greg Wilson BA 3230 gvwilson@cs.utoronto.ca. Introduction. Getting the right answer is important Getting the right answer quickly is also important If we didn’t care about speed, we’d do things by hand
E N D
CSC407: Software ArchitectureSummer 2006Performance Greg Wilson BA 3230 gvwilson@cs.utoronto.ca
Introduction • Getting the right answer is important • Getting the right answer quickly is also important • If we didn’t care about speed, we’d do things by hand • Choosing the right algorithm is part of the battle • Choosing a good architecture is the other part • Only way to tell good from bad is to analyze and measure actual performance
Example: File Server • Dedicated server handing out PDF and ZIP files • One CPU • 4 disks: PDFs on #1 and #2, ZIPs on #3 and #4 • Have to know the question to get the right answer • How heavy a load can it handle? • Would it make more sense to spread all files across all disks?
We Call It Computer Science… • …because it’s experimental • Collect info on 1000 files downloaded in 200 sec
Summary Statistics • Analyze all 1000 downloads in a spreadsheet • Yes, computer scientists use spreadsheets… • We’re justified in treating each type of file as a single class
Modeling Requests • The concurrency level is the number of things of a particular class going on at once • Estimate by adding up total download time for PDF and ZIP files separately, and dividing by the actual elapsed time • NPDF = 731.5/200 = 3.7 • NZIP = 3207.7/200 = 16.1 • Round off: download ratio is 4:1
Measuring Service Demands • What load does each request put on the disk and CPU? • Create N files of various sizes: 10KB, 100KB, 200KB, …, 1GB • Put them on a single-CPU, single-disk machine • That’s doing nothing else • Measure download times • TCPU = 0.1046σ – 0.0604 • Hm… • Tdisk = 0.4078σ + 0.2919
Back To The Data • Use Mean Value Analysis to calculate service demands • Remember to divide disk requirements by 2
Observations • After ~20 users, the server saturates • Maximum throughput for PDF files: • 12 files/sec in original configuration • 5 files/sec in balanced configuration • Maximum throughput for ZIP files: • 4.2 files/sec in original configuration • 6.6 files/sec in balanced configuration
Service Level Agreements • SLA requires average download times of 20 sec (ZIP files) and 7 sec (PDF files) • Original configuration: ZIP threshold reached at approximately 100 users, when PDF download time still only ~3 sec • Balanced configuration: ZIP threshold reached at ~165 users, and PDF download time is 6.5 sec • Balanced configuration is strictly superior
How Did We Do That? • Key concern is quality of service (QoS) • Throughput: transactions/second, pages/second, etc. • Response time • And variation in response time • People would rather wait 10 minutes every day than 1 minute 9 days, and 20 minutes the tenth • Availability • 99.99% available = 4.5 minutes lost every 30 days • That’s not good enough for 911
A Simple Database Server • Circles show resources • Boxes show queues • Throughput and response times depend on: • Service demand: how much time do requests need from resources? • System load: how many requests are arriving per second? CPU disk
Classes of Model • An open class is specified by the rate at which requests arrive • Throughput is an input parameter • A closed class is specified by the size of the customer population • E.g., total number of queries to be processed, or total number of system users • Throughput is an output • Can also have load-dependent and load-independent resources, mixed models, etc.
Values We Can Measure • T: length of observation period • K: number of resources in the system • Bi: total busy time of resource i in observation period • Ai: number of request arrivals for resource i • A0 is total number of request arrivals for whole system • Ci: number of service completions for resource i • C0 is completions for whole system • In steady state for large T, Ai = Ci
Values We Can Calculate • Si: mean service time at resource i (Bi/Ci) • Ui: utilization of resource i (Bi/T) • Xi: throughput of resource i (Ci/T) • In steady state, Xi = Ai = Ci = λ • Vi: average visit count for resource i (Ci/C0)
Utilization Law • Utilization Ui = Bi/T = (Bi/Ci)/(T/Ci) • But Bi/Ci is Si, and T/Ci is just 1/λ • So Ui = λSi • I.e., utilization is the throughput times the service time, which makes sense
Service Demand Law • Service demand Di is the total average time required per request from resource i • Di = UiT/C0 • I.e., fraction of time busy, times total time, over number of requests • But UiT/C0 = Ui/(C0/T) = Ui/ λ • I.e., service demand is utilization over throughput • Ui/X0 = (Bi/T)/(C0/T) = Bi/C0 = ViSi • So service demand is average number of visits times mean service time per visit
Little’s Law • Average number of requests being processed at any time = throughput × average time each request stays in the system • So: • 0.5 requests per second (= throughput) • 10 second response time (= time each request stays in system) • There must be 5 servers
Interactive Response Time Law • S clients accessing a database • Each client thinks for Z seconds between requests • Average database response time is R seconds • If M is the average number of clients thinking, and N is the average number of requests at the database, then S = M+N • Little’s Law applied to clients: M = λZ • Little’s Law applied to database: N = λR • So M+N = S = λ(Z+R) • Or R = S/ λ - Z
The Weakest Link • X0 = Ui/Di 1/Di for all resources • So X0 1/max{Di} • Remember Little's Law: N = RX0 • I.e., number of concurrent transactions is response time throughput • But R is at least the sum of the service demand times • So N (Di) X0 • Or X0 N/(Di) • So X0 min[1/max{Di}, N/(Di)]
Amdahl's Law • Let: • t1 be a program’s runtime on one CPU • tp be its runtime on p CPUs • ß be the algorithm’s serial fraction
…Amdahl's Law • Example: • Want 32 speedup on 64-processor machine • So ß must be 0.984 • I.e., 98% of the code must run in parallel • Ouch • What if only half the code can run in parallel? • s32 is 1.97 • Ouch again
Hockney's Measures • Every pipeline has some startup latency • So characterize pipelines with two measures: • r is the rate on an infinite data stream • n1/2 is the data volume at which half that rate is achieved • Improve real-world performance by: • Increasing throughput • Decreasing latency r n
Some Quotations • Philosophers have only interpreted the world in various ways; the point, however, is to change it. • Karl Marx • You cannot manage what you do not measure. • Bill Hewlett • Measure twice, tune once. • Greg Wilson
A Simple CGI 5.1 browser /var/apache/httpd 5.3 /local/bin/python 3.3 2.7 /site/cgi-bin/app.cgi 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3
How Did I Get These Numbers? • Shut down everything else on the test machine • Use ps and truss on Unix • sysinternals.org has lots of tools to help you find things • Use a script instead of a browser • Insert timers in Python and recompile • Could wrap in a timing script, but that distorts things • Measure import times in my own script • Rely on PostgreSQL's built-in monitors • Use a profiler
Profiling • A profiler is a tool that can build a histogram showing how much time a program spent where • Can either instrument or sample the program • Both affect the program's performance • The more information you collect, the more distortion there is • Heisenberg's Law • Most can accumulate data over many program runs • Often want to distinguish the first run(s) from later ones • Caching, precompilation, etc.
A Simple CGI Revisited Can't do much about this 0.2 5.1 browser /var/apache/httpd 5.3 1.8 fork/exec is expensive /local/bin/python 3.3 import 0.6 what's going on here? 2.7 /site/cgi-bin/app.cgi 0.9 1.8 waiting out turn at DB 0.7 0.2 disk I/O /usr/bin/psql 0.3 how many transactions? are they one class?
Room for Improvement • Forking a new Python interpreter for each request is expensive • So keep an instance of Python running permanently beside the web server, and re-initialize it for each request • FCGI/SCGI • Tomcat is usually run this way • The ability to do this is one of the reasons VM-based languages won the server wars
…Room for Improvement • Reimporting the libraries is expensive, too • Rely on cached .pyc files • Or rewrite application around a request-handling loop • Modularity is your friend • Tightly-coupled components cannot be tuned independently • On the other hand, machine-independent code has machine-independent performance
After Our Changes was 5.3 0.2 2.6 browser /var/apache/httpd 2.8 0.1 /local/bin/python 2.5 0.6 this has to be the next target 1.9 /site/cgi-bin/app.cgi 0.1 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3
When Do You Stop? • An optimization problem on its own • Time invested vs. likely performance improvements • Plan A: stop when you satisfy SLAs • Or beat them—always nice to have some slack • Plan B: stop when there are no obvious targets • Flat performance profiles are hard to improve • Plan C: stop when you run out of time • Plan D: stop when performance is "good enough"
Five Timescales • Human activities fall into natural cognitive categories: • Continuous • Sip of coffee • Fresh pot • Buy some more beans • Harvest time • Tuning a well-written application usually just improves its performance within its category • Revolutions happen when things are moved from one category to another