260 likes | 668 Views
Evaluating Performance. COSC 201. Administrivia. I’ve graded through lab 4 will email your grades when I finish lab 5 homework due tomorrow I’ll post the HW answers once they’re turned in Lab on Tuesday: Datapath ALU and other Logisim examples from class are on the webpage
E N D
Evaluating Performance COSC 201
Administrivia • I’ve graded through lab 4 • will email your grades when I finish lab 5 • homework due tomorrow • I’ll post the HW answers once they’re turned in • Lab on Tuesday: Datapath • ALU and other Logisim examples from class are on the webpage • Appendix B is on the web page
Some questions: • How long does it take Google to look up “Tostitos” • How long does it take to factor a large prime number? • How long to sort an array?
Hard to isolate things • For all of these questions, we’re measuring other things • network • memory • hard disk • compiler • algorithms
We’re measuring time • suppose a task takes 30 seconds • 29 seconds to download a file on a 100 Mbps network, and 1 second to process it • now we go to a 1 Gbps network connection • about a 4 second task • 2.9 seconds to download, 1 second to process
Wall-clock time • Simple way to measure • Lots of other variable affect wall-clock time • speed of the disk • network • how many users are on the machine • many linux workstations are multi-user
Multi-user systems are shared • Examples: • mail server • blackboard / Marmoset • Each job gets a timeslice of about 10 ms • This is not a precise count • Processing resources are shared, but not perfectly evenly
Measure CPU time • Time your job actually spends using the CPU • User time • Time the OS spends doing things that aren’t your job, or your job spends waiting for events, like data from disk • System time
Clocks • We’ve already seen that clocks regulate how things happen in a CPU • ticks, clocks, cycles, clock periods, clock ticks • 2 Ghz means 2 * 10^9 cycles per second • Many instructions take 1 cycle to complete • Some instructions take multiple cycles to complete
Tradeoffs • Increasing the clock speed often means that some instructions that “barely fit” in a clock cycle will now require multiple cycles to complete • Sometimes this is good
Measuring clock time • CPU time = (CPU clock cycles) / (clock rate) • 4 Ghz computer takes 10 seconds to perform a task • We want to drop this down to 6 seconds on a new computer we’re designing • Clock rate can be sped up, but will require 1.2 times as many instructions
10 seconds = (Cycles) / (4 Ghz) • 10 = X / 4 * 10^9 • X = 40 * 10^9 • 40 * 10^9 * 1.2 = 48 * 10^9 • 6 seconds = 48 * 10^9 / X • 8 Ghz
Clock Cycles Per Instruction • abbreviated CPI • average number of cycles required for each instruction • Estimate for a particular workload • differs for each architecture • may differ for streams of instructions for different programs on the same architecture
What components affect performance? • Algorithm • Instruction count, CPI • Programming language • Instruction count, CPI • Compiler • Instruction count, CPI • Instruction set architecture • Instruction count, CPI, clock rate
We can’t just measure instruction counts • Some instructions take multiple cycles • It may be more efficient to execute more instructions if those instructions each take fewer cycles
“performance” means different things in different contexts • Performance metric for a server that matters is throughput • we don’t care if a couple of clients are slow, so long as on average everyone is fast enough • Performance of an operating system should incorporate response time • Even if Windows hangs for 5 minutes, I better be able to move the mouse!
“performance” in context, cont. • Performance metric that matters for air-traffic control system is the worst case • doesn’t matter if on average everything is great, we can’t have anything run slowly
Many ways to measure performance • instruction counts • CPI • wall-clock time
Throughput vs Response Time • Faster CPUs vs more CPUs • A faster CPU usually decreases your response time • you can handle more instructions per unit of time • great for video games • Adding more CPUs increases throughput • Can perform multiple tasks at once • great for servers • like the late Marmoset…
Beware of Benchmarks • small code segments that are easy to run and report results for • Advantages and disadvantages?
Benchmarks aren’t real programs • Useful when designing an architecture where there’s no existing compiler • easy to code up • easy to debug • Can give extremely misleading performance results
Should measure performance for real applications • Harder to get misleading results • Harder to tweak your compiler/architecture/whatever to get artificially good results • I.e. harder to cheat!
How would we measure the performance of: • Queries to google? • Factoring large prime numbers? • Sorting an array? • Accounting software?