140 likes | 380 Views
Excess- k notation. A method for representing signed integers add excess amount k to every number to obtain a positive integer (smallest number that can be represented is k ) most significant bit represents sign (1 = positive, 0 = negative) Examples (excess-128):
E N D
Excess-k notation • A method for representing signed integers • add excess amount k to every number to obtain a positive integer (smallest number that can be represented is k) • most significant bit represents sign (1 = positive, 0 = negative) • Examples (excess-128): 1 0 1 0 0 0 0 1 = +33 0 1 0 1 1 1 1 1 = 33 1 0 0 0 0 0 0 0 = 0 unique representation of 0! • If bias k = 2n-1, then excess-k is almost identical to two’s complement • sign-bit reversed (1 means positive) • comparing integers identical to the unsigned case • Big endian (MSB first) vs. little endian (LSB first) 128 64 32 16 8 4 2 1
Floating Point Numbers • How can we represent 3.14 ? • What’s wrong with: (int_part, frac_part) • 3.14 and 3.014 have the same representation! • The leading-zeroes problem can be solved if numbers are normalized • write the number in the form d.f 10e , d is a single non-zero digit • normalized(3.14) = 3.14 100, normalized(0.314) = 3.14 101 • In binary, the “d” part will always be 1 (zero is a special case) • this implicit 1 can be ignored • Ideal representation scheme has these features: • can represent positive and negative, low and high magnitude • it is easy to compare two numbers • it is easy to do basic math
single precision float 1 sign bit 1 = negative 0 = positive 8-bit exponent e excess-127 notation 23-bit fraction f double precision double 1 sign bit 1 = negative 0 = positive 11-bit exponent e excess-1023 notation 52-bit fraction f IEEE 754 standard • Format for single-precision (32-bit) and double-precision (64-bit) reals • The normalized (non-zero) binary number 1.f 2e is stored as • Comparison of floats almost identical to comparison of ints! • MIPS has separate floating point registers and instructions
Two notions of performance • Which has higher performance? • From a passenger’s viewpoint: latency (time to do the task) • hours per flight, execution time, response time • From an airline’s viewpoint: throughput (tasks per unit time) • passengers per hour, bandwidth • Latency and throughput are often in opposition
Some Definitions • Latency is time per task (e.g. hours per flight) • If we are primarily concerned with latency, Performance(x) = 1 execution_time(x) • Throughput is number of tasks per unit time (e.g. passengers per hour) Performance(x) = throughput(x) Again, bigger is better • Relative performance: “x is N times faster than y” N = Performance(x) Performance(y) Bigger is better
Cycles Per Instruction CPU performance • The obvious metric: how long does it take to run a test program? • Aircraft analogy: how long does it take to transport 1000 passengers? Our vocabulary Aircraft analogy N instructions N passengers c cycles per instruction (1/c) passengers per flight t seconds per cycle t hours per flight Time = N c t seconds Time = N c t hours CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX
The three components of CPU performance • Instructions executed: • the dynamic instruction count (#instructions actually executed) • not the (static) number of lines of code • Cycles per instruction: • average number of clock cycles per instruction • function of the machine and program • CPI(floating-point operations) CPI(integer operations) • Pentium executes same instructions as an 80486, but faster • Single-cycle machine: each instruction takes 1 cycle (CPI = 1) • CPI can be 1 due to memory stalls and slow instructions • CPI can be 1 on superscalar machines • Clock cycle time: 1 cycle = minimum time it takes the CPU to do any work • clock cycle time = 1/ clock frequency • 500MHz processor has a cycle time of 2ns (nanoseconds) • 2GHz (2000MHz) CPU has a cycle time of just 0.5ns • higher frequency is usually better
Execution time, again CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX • The easiest way to remember this is match up the units: • Make things faster by making any component smaller! • Often easy to reduce one component by increasing another
Example: ISA-compatible processors • Let’s compare the performances two x86-based processors • An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor • A 1GHz Pentium III with a CPI of 1.5 for the same program • Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions • But they implement the ISA differently, which leads to different CPIs CPU timeAMD,P = InstructionsP * CPIAMD,P * Cycle timeAMD = CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3 =
Another Example: Comparing across ISAs • Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If it achieves an average of 3 instructions per cycle, how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1? • Itanium is three times faster • Itanium is one third as fast • Not enough information
Improving CPI • Some processor design techniques improve CPI • Often they only improve CPI for certain types of instructions where Fi = fraction of instructions of type i First Law of Performance: • Make the common case fast Second Law of Performance: Make the fast case common
Example: CPI improvements • Base Machine: • How much faster would the machine be if: • we added a cache to reduce average load time to 3 cycles? • we added a branch predictor to reduce branch time by 1 cycle? • we could do two ALU operations in parallel?
Amdahl’s Law • Amdahl’s Law states that optimizations are limited in their effectiveness • Example: Suppose we double the speed of floating-point operations • If only 10% of the program execution time T involves floating-point code, then the overall performance improves by just 5%