260 likes | 414 Views
Review. Aspects of Performance. Clock ( clk ) cycle Time of one clock period Generally constant for a processor Instruction count (IC) Number of instructions to be executed for a program Different instructions may consume different number of cycles
E N D
Aspects of Performance • Clock (clk) cycle • Time of one clock period • Generally constant for a processor • Instruction count (IC) • Number of instructions to be executed for a program • Different instructions may consume different number of cycles • Different for different programs, compilers and compilation • Cycles per instruction (CPI) • Average CPU clock cycles for a program • Execution Time
CPI • CPI = cycles per instruction • CPI provides one way of comparing two different implementations of the same instruction set architecture • CPI is tricky! • Different instructions require cycles depending on what they do • Depends on program • Memory behavior affects CPI
CPI for program may not be available • Given • CPI for individual instruction • IC for individual instructions • Profiling a program • Simulation of architecture
Example • Which one is faster? • C1= 1 *2 + 2*1 + 3*2 = 10 • C2 = 1*4 + 2*1 + 3*1 = 9 • C2 < C1 (Less clock cycles) hence C2 is faster.
Amdahl’s Law • Basic idea: improve the common case • Improvement by the faster mode is limited by the fraction of time the faster mode can be used
If we make division run 3 times faster and multiplication run 8 times faster what is the overall speedup? • We want to make the machine run 4 times faster. Can we achieve this goal just by making one change- either multiplication or division?
Instruction Set • The repertoire of instructions of a computer • Different computers have different instruction sets • But with many aspects in common • Early computers had very simple instruction sets • Simplified implementation • Many modern computers also have simple instruction sets
R-Type Instruction • Register type: Operates on 3 registers • op: operation • rs: first source operand • rt: second source operand • rd: destination operand • shamt: shift amount- used only in shift operations • funct: selects specific variant of the opcode • Syntax : <op> $rd, $rs, $rt • add $t0, $s1, $s2
I-Type Instruction • Immediate Type: Operate on 2 registers • rs, imm always as source • rs- flexible • Syntax • <op> $rt, $rs, imm • <op> $rt, offset($rs) • addi $s0, $s1, 5 • sw $s1, 4($t1)
Exercise • Reverse engineer the instruction: 0xAD310004 (corresponding assembly code???)
Solution 0xAD310004 sw $s1, 4($t1)
IEEE 754 FP Format single: 8 bitsdouble: 11 bits single: 23 bitsdouble: 52 bits • S: sign bit (0 non-negative, 1 negative) • Normalized significand • Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) • Exponent: Actual exponent + Bias • Ensures exponent is unsigned • Single: Bias = 127; Double: Bias = 1023 S Exponent Fraction
Decimal to FP Conversion • Represent –0.75 • –0.75 = (–1)1 × 1.12 × 2–1 • S = 1 • Fraction = 1000…002 • Exponent = –1 + Bias • Single: –1 + 127 = 126 = 011111102 • Double: –1 + 1023 = 1022 = 011111111102 • Single: 1011111101000…00 • Double: 1011111111101000…00
Control Signals • RegWrite: Whether a register is to be written to • ALUSrc: Decides second ALU operand • ALUOp: Which operation in ALU • PCSrc: Determines the program counter • MemWrite: Whether memory is to be written • MemRead: Whether memory is to be read • MemToReg: Register write data from memory or ALU
0 4 35 or 43 rs rs rs rt rt rt rd address address shamt funct 31:26 31:26 31:26 25:21 25:21 25:21 20:16 20:16 20:16 15:11 10:6 15:0 15:0 5:0 Instruction Format Decide Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add
Pipeline: MIPS Instructions • Steps • IF: Fetch instruction from memory • ID: Decode the instruction/ Read the registers • Ex: Execute instruction or calculate address • Mem: Access operand in data memory • WB: Write result into register
Pipeline Speedup • Execution time of an instruction not affected • Speedup! • Ideal Case • All stages take equal time • Speedup = Number of stages in the pipeline • Increases throughput by overlapping the instructions. Different instruction use different resources. Number of instructions executed per unit of time increased. • Not Always: Some stage may be longer
Pipelined Control Stages • Need to set control lines • IF: Control signals to read instruction memory and to PC are always asserted • ID: Same thing happens every clock cycle • EX: RegDst, ALUOp, ALUSrc • MEM: Branch, MemRead, MemWrite • WB: MemToReg, RegWrite
Pipeline Hazards • Hazard: Next instruction cannot be executed in the following cycle • Classification: • Structural: Two instructions use same resource • Separate instruction and data memory to resolve • Data: Destination register in current instruction used as source in next • Stall, forwarding, instruction reordering • Control: Due to branch instructions • Stall, branch prediction, branch delay slot