390 likes | 472 Views
INTRODUCTION. Jehan-François Pâris jparis@uh.edu. An evolving field. Computer architectures keep changing Building faster computers Supercomputers and data centers Building cheaper, smaller computers Laptops, notebooks, netbooks, smartbooks Putting computer systems everywhere
E N D
INTRODUCTION Jehan-François Pâris jparis@uh.edu
An evolving field • Computer architectures keep changing • Building faster computers • Supercomputers and data centers • Building cheaper, smaller computers • Laptops, notebooks, netbooks, smartbooks • Putting computer systems everywhere • Cars, cell phones, HDTV:embedded computers
An analogy • Electrical motors • Replaced the single steam engine powering many machines through transmission belts and pulleys • One electrical motor per machine • Domestic appliances, car starters, … • Power tools • Power windows, electrical toothbrushes, …
The coming revolution • Cannot increase CPU clock frequency above2 GHz without running into unsolvableheat dissipation problems • Switch to multicore architectures • Two, four, eight, … CPUs per chip • Creates new problems • Hardware: cache synchronization • Software: programming these beasts Ouch!
Other challenges • Reducing power consumption of data centers • Often contain archival data that arevery rarely accessed • Finding new ways to keep increasing magnetic disk capacity • Dealing with physical limits to SDRAM density • Will never get 8 TB SODIMM modules • Finding a replacement for hard drives
Classical computer components • Input • Output • Memory • Datapath • Control • Datapath + Control = Processor • Storage subsystem is missing!
The course philosophy • Showing you how computer work is fine • Showing you how to make them faster is better!
PERFORMANCE ISSUES • Defining performance • Measuring it • Not an easy task • Evaluating the impact of • Amount of work done by each instruction • Time they take to run • CPU clock speed
Measuring Performance • Inverse of execution time of a benchmark Performance = 1/Execution Time • If computers A and B are such that Execution TimeA < Execution TimeB for the same benchmark, then PerformanceA > PerformanceB
SPEC CPU Benchmark • SPEC CPU2006 • Set of 12 integer and 17 floating-point benchmarks • Results are normalized: Execution on a reference processor /Execution on benchmarked processor • Single value is geometric mean of these ratios
How is it computed (I) • Two new processors P and Q compared toa reference processor R • Execution times for n benchmarks • P1, P2, …, Pn • Q1, Q2, …, Qn • R1, R2, …, Rn
How it is computed • SPEC value for processor P is • Observe that • (property of geometric mean)
Impact of Instruction Set • Execution Time =Number of Instructions ×Mean Instruction Execution Time • Gave birth to the idea of more complex instruction sets • Each does more • Fewer instructions
Impact of Clock Speed • Execution Time =Number of Clock Cycles × Clock Cycle Timesame asExecution Time =Number of Clock Cycles / Clock Frequency
Putting everything together • Execution Time =Number of Instructions ×Number of Clock Cycles per Instruction ×Clock Cycle Time • Gives us three ways to reduce program execution time
1. Using fewer instructions • VAX • Super minicomputer designed in late 70’s • Had a complicated instruction set (CISC) • Idea was to use more powerful instructions in order to reduce the number of instructions used to perform most frequent tasks • Poor pipelining performance
2. Using a faster clock • Major reason for explosion of CPU performance in the 80’s and 90’s • IBM PC (1981):Intel 8088 @ 4.77 MHz • IBM PC AT (1984):Intel 80286 @ 6 and 8 MHz • Nowadays up to 3 GHz • Cannot get much higher!
3. Using better instructions • Best strategy is to reduce the average number of clock cycles per instruction • Privileging fast instructions • Using fixed-size instructions to allow pipelining • Trying to execute as many tasks as possible in parallel
Amdahl’s Law (I) • Examples: • Supersonic jet • Could fly from Houston to Washington in thirty minutes • Total travel time would be dominated by travel time to airport and check in procedures • Today's laptops: • Disk access times are the bottleneck
Amdahl’s Law (II) • Assume that we have a technique for improving the performance of some part of a system. • Let • To be the time originally spent in the part of the system that can be improved • Ti be the time spent in that part once the improvement has been applied • Tn be the time spent in in the part of the system that remains unaffected
Amdahl’s Law (III) • The total speedup for the whole system will be • The maximum possible speedup when Ti 0
An example • Flying to Washington National Airport takes three hours • Going to the airport and waiting for the flight takes a minimum of two hours • Going from the airport to Washington downtown takes a minimum of 30 minutes • What is the maximum speedup that could be achieved using much faster planes? 5h30 / 2h30 = 2.2
Answer • Current travel time: • To airport and wait: 2 hours • Plane: 3 hours • To downtown by DC metro: 30 minutes • Total: 5 hours 30 minutes
Answer • Assume plane travels at speed of light: • To airport and wait: 2 hours • Plane: negligible • To downtown by DC metro: 30 minutes • Total: 2 hours 30 minutes • Maximum speedup would be 5h30 / 2h30 = 2.2
Train and busses • Commuter trains and city busses spend significant amount of trip time debarking and embarking travelers • Have wide doors • Not true for Amtrak train and intercity buses • Fewer narrower doors
A problem • Assume we have a technique to improve the speed of floating-point operations by 20 percent • What will be the overall CPU speedup if we expect it to spend 10 percent of its time executing floating point operations? • How would that speedup be affected if the CPU spends 30 percent of its time executing floating point operations?
Solution (I) • First case: • Baseline time = 0.9 × 1 + 0.1 × 1 = 1 • After improvement = 0.9 × 1 + 0.1 × 0.8 = 0.98 • Speedup = 1/0.98 = 1.02 • A 2 percent improvement!
Solution (II) • Second case: • Baseline time = 0.7 × 1 + 0.3 × 1 = 1 • After improvement = 0.7 × 1 + 0.7 × 0.8 = 0.94 • Speedup = 1/0.94 = 1.064 • A 6.4 percent improvement!
Problem • Consider a huge program that consists of a purely sequential part that takes two hours and another part that takes eight hours.What is the maximum speedup we can achieve by parallelizing the second part of the program?
Answer • Current run time: • Sequential part: 2 hours • Other part: 8 hours • Total: 10 hours • Minimum run time: • Sequential part: 2 hours • Other part: negligible • Total: 2 hours
Answer • Current run time: • Sequential part: 2 hours • Other part: 8 hours • Total: 10 hours • Minimum run time: • Sequential part: 2 hours • Other part: negligible • Total: 2 hours Maximumspeed up10/2 = 5
Problem • Server motherboard A has a SPEC CPU2006 rating of 31.4 while server motherboard B has a rating of 29.7. Which one of the two motherboards is faster?
Answer • Server motherboard A has a SPEC CPU2006 rating of 31.4 while server motherboard B has a rating of 29.7. Which one of the two motherboards is faster? • Motherboard A because a higher SPEC value is better
Fun problem • Shanghai maglev train runs at 268 mph • How does it compare to airplane for going between Houston and Washington, DC?
Fun answer • Current travel time: • To airport and wait: 2 hours • Plane: 3 hours • To downtown by DC metro: 30 minutes • Total: 5 hours 30 minutes • With maglev: • To station: 1 hour • Train to downtown DC: 6 hours 30 minutes • Total: 7 hours 30 minutes
Fun answer • Current travel time: • To airport and wait: 2 hours • Plane: 3 hours • To downtown by DC metro: 30 minutes • Total: 5 hours 30 minutes • With maglev: • To station: one hour • Train to downtown DC: 6 hours 30 minutes • Total: 7 hours 30 minutes Plane is still fasterfor very long trips