560 likes | 692 Views
CS154B Review of MIPS ISA and Performance. Lecture 2. Review: Organization. All computers consist of five components Processor: (1) datapath and (2) control (3) Memory I/O: (4) Input devices and (5) Output devices Not all “memory” is created equally
E N D
CS154BReview of MIPS ISA and Performance Lecture 2
Review: Organization • All computers consist of five components • Processor: (1) datapath and (2) control • (3) Memory • I/O: (4) Input devices and (5) Output devices • Not all “memory” is created equally • Cache: fast (expensive) memory are placed closer to the processor • Main memory: less expensive memory--we can have more • Input and output (I/O) devices have the messiest organization • Wide range of speed: graphics vs. keyboard • Wide range of requirements: speed, standard, cost ... • Least amount of research (so far)
Summary: Computer System Components Proc • All have interfaces & organizations Caches Busses adapters Memory Controllers Disks Displays Keyboards I/O Devices: Networks
software instruction set hardware Review: Instruction Set Design Which is easier to change/design???
Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Instruction Set Architecture: What Must be Specified? • Instruction Format or Encoding • how is it decoded? • Location of operands and result • where other than memory? • how many explicit operands? • how are memory operands located? • which can or cannot be in memory? • Data type and Size • Operations • what are supported • Successor instruction • jumps, conditions, branches • fetch-decode-execute is implicit!
Comparison: Bytes per instruction? Number of Instructions? Cycles per instruction? Basic ISA Classes Most real machines are hybrids of these: Accumulator (1 register): 1 address add Aacc ¬ acc + mem[A] 1+x address addx A acc ¬ acc + mem[A + x] Stack: 0 address addtos ¬ tos + next General Purpose Register (can be memory/memory): 2 address add A BEA[A] ¬ EA[A] + EA[B] 3 address add A B CEA[A] ¬ EA[B] + EA[C] Load/Store: 3 address add Ra Rb RcRa ¬ Rb + Rc load Ra RbRa ¬ mem[Rb] store Ra Rbmem[Rb] ¬ Ra
Register Register (register-memory) (load-store) Load R1,A Add R1,B Store C, R1 Comparing Number of Instructions Code sequence for (C = A + B) for four classes of instruction sets: Stack Accumulator Push A Load A Load R1,A Push B Add B Load R2,B Add Store C Add R3,R1,R2 Pop C Store C,R3
° 1975-2000 all machines use general purpose registers ° Advantages of registers • registers are faster than memory • registers are easier for a compiler to use e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order - vs. stack • registers can hold variables - memory traffic is reduced, so program is sped up (since registers are faster than memory) code density improves (since register named with fewer bits - than memory location) General Purpose Registers Dominate
MIPS I Registers • Programmable storage • 2^32 x bytes of memory • 31 x 32-bit GPRs (R0 = 0) • 32 x 32-bit FP regs (paired DP) • HI, LO, PC
Memory Addressing ° Since 1980 almost every machine uses addresses to level of 8-bits (byte) ° 2 questions for design of ISA: • Since could read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address, How do byte addresses map onto words? • Can a word be placed on any byte boundary?
Addressing Objects: Endianess and Alignment • Big Endian: address of most significant byte = word address (xx00 = Big End of word) • IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA • Little Endian: address of least significant byte = word address(xx00 = Little End of word) • Intel 80x86, DEC Vax, DEC Alpha (Windows NT) little endian byte 0 3 2 1 0 lsb msb 0 1 2 3 0 1 2 3 Aligned big endian byte 0 Alignment: require that objects fall on address that is multiple of their size. Not Aligned
Addressing Modes Meaning Addressing mode Example R4 ¬ R4+R3 Register Add R4,R3 R4 ¬ R4+3 Immediate Add R4,#3 R4 ¬ R4+Mem[100+R1] Displacement Add R4,100(R1) R4 ¬ R4+Mem[R1] Register indirect Add R4,(R1) R3 ¬ R3+Mem[R1+R2] Indexed / Base Add R3,(R1+R2) R1 ¬ R1+Mem[1001] Direct or absolute Add R1,(1001) R1 ¬ R1+Mem[Mem[R3]] Memory indirect Add R1,@(R3) R1 ¬ R1+Mem[R2]; R2 ¬ R2+d Post-increment Add R1,(R2)+ R2 ¬ R2–d; R1 ¬ R1+Mem[R2] Pre-decrement Add R1,–(R2) R1 ¬ R1+Mem[100+R2+R3*d] Scaled Add R1,100(R2)[R3]
Addressing Mode Usage? (ignore register mode) 3 programs measured on machine with all address modes (VAX) --- Displacement: 42% avg, 32% to 55% 75% --- Immediate: 33% avg, 17% to 43% 85% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% displacement & immediate 85% displacement, immediate & register indirect
Generic Examples of Instruction Format Widths … Variable: Fixed: Hybrid: …
Instruction Formats • • If code size is most important, use variable length instructions • If performance is most important, use fixed length instructions • Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16); per procedure decide performance or density • Some architectures actually exploring on-the-fly decompression for more density.
register register MIPS Addressing Modes/Instruction Formats • All instructions 32 bits wide Register (direct) op rs rt rd Immediate immed op rs rt Base+index immed op rs rt Memory + PC-relative immed op rs rt Memory + PC • Register Indirect?
Typical Operations (little change since 1960) Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Data Movement integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Arithmetic shift left/right, rotate left/right Shift not, and, or, set, clear Logical unconditional, conditional Control (Jump/Branch) call, return Subroutine Linkage trap, return Interrupt test & set (atomic r-m-w) Synchronization search, translate String parallel subword ops (4 16bit add) Graphics (MMX)
Use of Registers • Ex: • a = ( b + c) - ( d + e) ; // C statement • add $t0, $s1, $s2 # $s0 - $s4 : a - e • add $t1, $s3, 4s4 • sub $s0, $t0, $t1 • a = b + A[4]; // add an array element to a var // $s3 has address A • lw $t0, 16($s3) • add $s1, $s2, $t0
Operation Summary Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;
Use of Registers : load and store • Ex • A[8]= a + A[6] ; // A is in $s3, a is in $s2 lw $t0, 24($s3) # $t0 gets A[6] contents add $t0, $s2, $s0 # $t0 gets the sum sw $t0, 32($s3) # sum is put in A[8]
load and store Ex: a = b + A[i]; // A is in $s3, a,b, i in // $s1, $s2, $s4 add $t1, $t1, $s3 # $t1 = 2 * i add $t1, $t1, $t1 # $t1 = 4 * i add $t1, $t1, $s3 # $t1 =addr. of A[i] ($s3+(4*i)) lw $t0, 0($t1) # $t0 = A[i] add $s1, $s2, $t0 # a = b + A[i]
Making Decisions • Ex if ( a==b) goto L1 // x,y,z,a,b map to $s0-$s4 x = y + z; L1 : x = x – a; beq $s3, $s4, L1 # goto L1 if a=b add $s0, $s1, $s2 # x = y + z (ignore ifa=b) L1:sub $s0, $s0, $s3 # x = x – a (always ex)
if-then-else • Ex: if ( a==b) x = y + z; else x = y – z ; bne $s3, $s4, Else # goto Else if a!=b add $s0, $s1, $s2 # x = y + z j Exit # goto Exit Else : sub $s0,$s1,$s2 # x = y – z Exit :
Loops • Ex: while ( A[i] == k ) // i,j,k in $s3. $s4, $s5 i = i + j; // A is in $s6 Loop : add $t1, $s3, $s3 # $t1 = 2 * i add $t1, $t1, $t1 # $t1 = 4 * I add $t1, $t1, $s6 # $t1 = addr. Of A[i] lw $t0, 0($t1) # $t0 = A[i] bne $t0, $s5, Exit # goto Exit if A[i]!=k add $s3, $s3, $s4 # i = i + j j Loop # goto Loop Exit :
Compiling switch switch(a) { • case 0 : f = I + j; break; • case 1 : f = g + h; break; • case 2 : f = g – h; break; • case 3 : f = i – j; break; } slt $t3, $s5, $zero # is k < 0 bne $t3, $zero, Exit # if k < 0, goto Exit slt $t3, $s5, $t2 # is k < 4 beq $t3, $zero, Exit # if k >=4 goto Exit
Switch cont. add $t1, $s5, $s5 # $t1 = 2 * i add $t1, $t1, $t1 # $t1 = 4 * i add $t1, $t1, $t4 # $t1 = addr. Of table[k] lw $t0, 0($t1) # $t0 = table[k] jr $t0 # jump to addr. In $t0 L0 : add $s0, $s3, $s4 # f = i + j j Exit L1 : add $s0, $s1, $s2 # f = g + h j Exit L2 : sub $s0, $s1, $s2 # f = g – h j Exit L3 : sub $s0, $s1, $s2 # f = i – j Exit :
Calling conventions int func(int g , int h, int i, int j) { int f; f = ( g + h ) – ( i + j ) ; return ( f ); } // g,h,i,j - $a0,$a1,$a2,$a3, f in $s0 func : sub $sp, $sp, 12 # make room in stack for 3 words sw $t1, 8($sp) # save the regs we want to use sw $t0, 4($sp) sw $s0, 0($sp) add $t0, $a0, $a1 # $t0 = g + h add $t1, $a2, $a3 # $t1 = i + j sub $s0, $t0, $t1 # $s0 has the result add $v0, $s0, $zero # return reg $v0 has f
Calling (cont.) lw $s0, 0($sp) # restore $s0 lw $t0, 4($sp) # restore $t0 lw $t1, 8($sp) # restore $t1 add $sp, $sp, 12 # restore sp • we did not have to restore $t0-$t9 (caller does not expect) • we do need to restore $s0-$s7 (must be preserved)
Compiling a String Copy Proc. void strcpy ( char x[ ], y[ ]) { int i=0; while ( (x[ i ] = y [ i ] != 0) i ++ ; } // x and y base addr. are in $a0 and $a1 strcpy : sub $sp, $sp, 4 # reserve 1 word space in stack sw $s0, 0($sp) # save $s0 add $s0, $zer0, $zer0 # i = 0 L1 : add $t1, $a1, $s0 # addr. of y[ i ] in $t1 lb $t2, 0($t1) # $t2 = y[ i ] add $t3, $a0, $s0 # addr. Of x[ i ] in $t3 sb $t2, 0($t3) # x[ i ] = y [ i ] beq $t2, $zero, L2 # if y [ i ] = 0 goto L1
string copy (cont.) add $s0, $s0, 1 # i ++ j L1 # go to L1 L2 : lw $s0, 0($sp) # restore $s0 add $sp, $sp, 4 # restore $sp jr $ra # return • it may be handier to use immediate operands sometimes • To add an integer in location 2000 to $sp lw $t0, 2000($zero) # $t0 = (2000) Add $sp, $sp, $t0 # $sp = $sp + (2000) Use : addi $sp, $sp, 4 # same effect # avoids memory access
Array vs. Pointers Set1 ( int array [ ] , int size) { int i ; for ( i = 0; i < size; i ++) array [ i ] = i; } Set2 ( int *array , int size) { int *p, i=0 ; for ( p = &array[0]; p < &array[size]; p ++) *p = i ++; }
Arrays vs. Pointers : MIPS for array version // $a0 = addr. of array, $a1 has size, $t0 = i move $t0, $zero # i = 0 Loop1: add $t1, $t0, $t0 # $t1 = 2 * i add $t1, $t1, $t1 # $t1 = 4 * i add $t2, $a0, $t1 # $t2 = addr. of array sw $t0, 0($t2) # array[i] = i addi $t0, $t0,1 # i ++ slt $t3, $t0, $a1 # check end of loop (i <size) bne $t3, $zero, Loop1 # if i < size goto Loop1
Arrays vs. Pointers : MIPS for pointer version // $a0 = addr. of array, $a1 has size, $t0 = p move $t0, $a0 # p = addr. of array[0] move $t4, $zero # $t4 = i add $t1, $a1, $a1 # $t1 = 2 * size add $t1, $t1, $t1 # $t1 = 4 * size add $t2, $a0, $t1 # $t2 = addr. of array[size] Loop2: sw $t4, 0($t0) # store i in *p addi $t0, $t0, 4 # p = p + 4 addi $t4, $t4,1 # i ++ slt $t3, $t0, $t2 # $t3=(p<&array[size]) bne $t3, $zero, Loop2 # if p < last addr. Go Loop1
MIPS I Operation Overview • Arithmetic Logical: • Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU • AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI • SLL, SRL, SRA, SLLV, SRLV, SRAV • Memory Access: • LB, LBU, LH, LHU, LW, LWL,LWR • SB, SH, SW, SWL, SWR
Multiply / Divide • Start multiply, divide • MULT rs, rt • MULTU rs, rt • DIV rs, rt • DIVU rs, rt • Move result from multiply, divide • MFHI rd • MFLO rd • Move to HI or LO • MTHI rd • MTLO rd Registers HI LO
Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code UNICODE 16 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision How many +/- #'s? Where is decimal pt? How are +/- exponents represented? exponent E M x R base mantissa
Operand Size Usage • Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
MIPS arithmetic instructions Instruction Example Meaning Comments add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Hi = $2 mod $3 Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo
MIPS logical instructions Instruction Example Meaning Comment and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 Å $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
MIPS data transfer instructions Instruction Comment SW 500(R4), R3 Store word SH 502(R2), R3 Store half SB 41(R3), R2 Store byte LW R1, 30(R2) Load word LH R1, 40(R3) Load halfword LHU R1, 40(R3) Load halfword unsigned LB R1, 40(R3) Load byte LBU R1, 40(R3) Load byte unsigned LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16) LUI R5 R5 0000 … 0000
When does MIPS sign extend? • When value is sign extended, copy upper bit to full value:Examples of sign extending 8 bits to 16 bits: 00001010 00000000 00001010 10001100 11111111 10001100 • When is an immediate value sign extended? • Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions! • Logical instructions do not sign extend • Load/Store half or byte do sign extend, but unsigned versions do not.
Methods of Testing Condition • Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex: add r1, r2, r3 bz label • Condition Register Ex: cmp r1, r2, r3 bgt r1, label • Compare and Branch Ex: bgt r1, r2, label
Conditional Branch Distance • 25% of integer branches are 2 to 4 instructions
Conditional Branch Addressing • PC-relative since most branches are relatively close to the current PC • At least 8 bits suggested (128 instructions) • Compare Equal/Not Equal most important for integer programs (86%)
MIPS Compare and Branch • Compare and Branch • BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch • BNE rs, rt, offset <> • Compare to zero and Branch • BLEZ rs, offset if R[rs] <= 0 then PC-relative branch • BGTZ rs, offset > • BLT < • BGEZ >= • BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) • BGEZAL >=! • Remaining set of compare and branch ops take two instructions • Almost all comparisons are against zero!
MIPS jump, branch, compare instructions Instruction Example Meaning branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp. set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp. set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; natural numbers set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; natural numbers jump j 10000 go to 10000 Jump to target address jump register jr $31 go to $31 For switch, procedure return jump and link jal 10000 $31 = PC + 4; go to 10000 For procedure call
R1= 0…00 0000 0000 0000 0001 R2= 0…00 0000 0000 0000 0010 R3= 1…11 1111 1111 1111 1111 After executing these instructions: slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0 What are values of registers r4 - r7? Why? r4 = ; r5 = ; r6 = ; r7 = ; Signed vs. Unsigned Comparison two two two
Calls: Why Are Stacks So Great? Stacking of Subroutine Calls & Returns and Environments: A A: CALL B CALL C C: RET RET A B B: A B C A B A Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)
Memory Stacks Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: 0 Little inf. Big Next Empty? Memory Addresses grows up grows down c b Last Full? a SP inf. Big 0 Little How is empty stack represented? Little --> Big/Next Empty POP: Decrement SP Read from Mem(SP) PUSH: Write to Mem(SP) Increment SP Little --> Big/Last Full POP: Read from Mem(SP) Decrement SP PUSH: Increment SP Write to Mem(SP)