550 likes | 695 Views
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Chapter 2. Instructions: Language of the Computer. Zhao Zhang Iowa State University Revised from original slides provided by MKP. Review of Week 4. MIPS procedure/function call convention Leaf and non-leaf examples
E N D
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 2 Instructions: Language of the Computer Zhao Zhang Iowa State University Revised from original slides provided by MKP
Review of Week 4 • MIPS procedure/function call convention • Leaf and non-leaf examples • Clearing array example • String copy example • Other issues: • Load 32-bit immediate • Assembler, loader, and compiler effects §2.8 Supporting Procedures in Computer Hardware Chapter 2 — Instructions: Language of the Computer — 2
Announcements • Exam 1 on Friday Oct. 4 • Course review on Wednesday Oct. 2 • HW4 is due on Sep. 27 • HW5 will be due on Oct. 11 • Do HW5 as exercise before Exam 1 • No HW and quizzes next week • Lab 2 demo is due this week and Lab 3 demo due next week • Lab 4 starts next week, due in one week Chapter 1 — Computer Abstractions and Technology — 3
Exam 1 • Open book, open notes, calculator are allowed • E-book reader is allowed • Must be put in airplane mode • Coverage • Chapter 1, Computer Abstraction and Technology • Chapter 2, Instructions: Language of the Computer • Some contents from Appendix B • MIPS floating-point instructions Chapter 1 — Computer Abstractions and Technology — 4
Exam Question Types • Short conceptual questions • Calculation: speedup, power saving, CPI, etc. • MIPS assembly programming • Translate C statements to MIPS (arithmetic, load/store, branch and jump, others) • Translate C functions to MIPS (call convention) • Among others Suggestions: • Review slidesand textbook • Review homework and quizzes Chapter 1 — Computer Abstractions and Technology — 5
Overview for Week 5 Overview for Week 5, Sep. 23 - 27 • Bubble sorting example • It will be used in Mini-Projects • Floating point instructions • ARM and x86 instruction set overview Chapter 1 — Computer Abstractions and Technology — 6
Classic Bubble Sorting • Bubble sort: Swap two adjacent elements if they are out of order • Pass the array n times, each time a largest element will float to the top • Look at the first pass of five elements 1st try: 5 3 8 2 7 => 3 5 8 2 7 2nd try: 3 5 8 2 7 => 3 5 8 2 7 3rd try: 3 5 827 => 3 5 2 87 4th try: 3 5 2 7 8=> 3 5 2 7 8 Chapter 1 — Computer Abstractions and Technology — 7
Classic Bubble Sorting • Pass i only has to check for (n-i) swaps • In each pass, an element may float up until it meets a larger element • The sorted sub-array increments by one 1st pass: 5 3 8 2 7 => 3 5 2 7 8 2nd pass: 3 5 2 7 8=> 3 2 5 7 8 3ndpass: 3 2 5 7 8 => 2 3 5 7 8 4ndpass: 2 3 5 7 8 => 2 3 5 7 8 Chapter 1 — Computer Abstractions and Technology — 8
Revised Bubble Sorting • The textbook bubble-sort is optimized to reduce comparisons void sort (int v[], int n) { inti, j; for (i= 0; i < n; i++) { for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--) swap(v, j); } } Chapter 1 — Computer Abstractions and Technology — 9
Revised Bubble Sorting • The classic one let a largest element float to the top of the unsorted sub-array • The revised one let an element float to its right place in the sorted sub-array 1stpass: 538 2 7 => 3 58 2 7 2ndpass: 3 58 2 7 => 3582 7 3nd pass: 3582 7 => 2 3 5 8 7 4nd pass: 2 3 5 87=> 2 3 5 7 8 Chapter 1 — Computer Abstractions and Technology — 10
The Swap Function • The swap function is a leaf function void swap(int v[], int k){int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;} • v in $a0, k in $a1, temp in $t0 §2.13 A C Sort Example to Put It All Together Chapter 2 — Instructions: Language of the Computer — 11
The Swap Function swap: sll $t1, $a1, 2 # $t1 = k * 4 add $t1, $a0, $t1 # $t1 = v+(k*4) # (address of v[k]) lw $t0, 0($t1) # $t0 (temp) = v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 (v[k+1]) sw $t0, 4($t1) # v[k+1] = $t0 (temp) jr $ra # return to calling routine Chapter 2 — Instructions: Language of the Computer — 12
The Sort Function for (i = 0; i < n; i++) { for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--) swap(v, j); } • Save $ra to stack, as it’s a non-leaf function • Assign i and j to $s0 and $s1 • They must be preserved when calling swap() • Move v, n from $a0 and $a1 to $s2 and $s2 • They must be preserved, too • $a0 and $a1 are used when calling swap() • We need a stack frame of 5 words or 20 bytes Chapter 1 — Computer Abstractions and Technology — 13
Sort Prologue and Epilogue sort: addi $sp,$sp, –20 # make room on stack for 5 registers sw $ra, 16($sp) # save $ra on stack sw $s3,12($sp) # save $s3 on stack sw $s2, 8($sp) # save $s2 on stack sw $s1, 4($sp) # save $s1 on stack sw $s0, 0($sp) # save $s0 on stack … # procedure body … exit1: lw $s0, 0($sp) # restore $s0 from stack lw $s1, 4($sp) # restore $s1 from stack lw $s2, 8($sp) # restore $s2 from stack lw $s3,12($sp) # restore $s3 from stack lw $ra,16($sp) # restore $ra from stack addi $sp,$sp, 20 # restore stack pointer jr $ra # return to calling routine • Entry: Get a frame, save $ra and $s3-$s0 • Exit: Restore $s0-$s3 and $ra, free the frame Chapter 2 — Instructions: Language of the Computer — 14
Sort Function Body A new pseudo instruction moverd, rs is equivalent to add rd, rs, $zero Example move $s2, $a0 # $s2 = $zero move $s3, $a1 # $s3 = $a1 No use of pseudo assembly instructions in Exam 1 Chapter 1 — Computer Abstractions and Technology — 15
Sort Function Body Moveparams move $s2, $a0 # save $a0 into $s2 move $s3, $a1 # save $a1 into $s3 move $s0, $zero # i = 0 for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n) beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n) addi $s1, $s0, –1 # j = i – 1 for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0) sll $t1, $s1, 2 # $t1 = j * 4 add $t2, $s2, $t1 # $t2 = v + (j * 4) lw $t3, 0($t2) # $t3 = v[j] lw $t4, 4($t2) # $t4 = v[j + 1] slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3 beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3 move $a0, $s2 # 1st param of swap is v (old $a0) move $a1, $s1 # 2nd param of swap is j jal swap # call swap procedure addi $s1, $s1, –1 # j –= 1 j for2tst # jump to test of inner loop exit2: addi $s0, $s0, 1 # i += 1 j for1tst # jump to test of outer loop Outer loop Inner loop Passparams& call Inner loop Outer loop Chapter 2 — Instructions: Language of the Computer — 16
Sort Function Optimized Old version: void sort(int v[], int n) inti, j; for (i = 0; i < n; i++) { for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--) swap(v, j); } New version: void sort(int v[], int n) { int *pi, *pj; for (pi = v; pi < &v[n]; pi++) for (pj= pj - 1; pj>= v && swap(pj); pj--) {} } Chapter 1 — Computer Abstractions and Technology — 17
New Swap Function • A more efficient swap function that reduces memory loads // swap two adjacent elements if they are // out of order. Return 1 if swapped, 0 // otherwise int swap(int *p) { if (p[0] > p[1]) { inttmp = p[0]; p[0] = p[1]; p[1] = tmp; return 1; } else return 0; } Chapter 1 — Computer Abstractions and Technology — 18
New Swap Function • A new swap function swap: lw $t0, 0($a0) # load p[0] lw $t1, 4($a0) # load p[1] slt $t2, $t1, $t0 # p[1] < p[0]? beq$t2, $zero, else sw $t1, 0($a0) # swap sw $t0, 4($a0) # swap addi $v0, $zero, 1 # $v0 = 1 jr $ra else: addi $v0, $zero, 0 # $v0 = 0 jr $ra Chapter 1 — Computer Abstractions and Technology — 19
New Sort Function The sort() function optimized • Register usage • $s0: v • $s1: &v[n] • $s2: pi • $s3: pj • Need a frame of 5 words to save $ra and $s0-$s2 Chapter 1 — Computer Abstractions and Technology — 20
Sort Prologue and Epilogue sort: addi $sp, $sp, -20 # frame of 5 words sw $ra, 16($sp) sw $s3, 12($sp) sw$s2, 8($sp) sw$s1, 4($sp) sw$s0, 0($sp) lw $s0, 0($sp) lw$s1, 4($sp) lw$s2, 8($sp) lw$s3, 12($sp) lw $ra, 16($sp) addi $sp, $sp, 20 # release frame jr $ra MIPS code for sort function body Chapter 1 — Computer Abstractions and Technology — 21
New Sort: Outer Loop for (pi = v; pi < &v[n]; pi++) for (pj = pj - 1; pj >= v && swap(pj); pj--) {} add $s0, $a0, $zero # $s0 = v sll $a1, $a1, 2 # $a1 = 4*n add $s1, $s0, $a1 # $s1 = &v[n] add $s2, $s0, $zero # pi = v j for1_tst for1_loop: addi$s2, $s2, 4 # pi++ for1_tst: slt $t0, $s2, $s1 # pi < &v[n]? bne $t0, $zero, for1_loop # yes? repeat C code for the inner loop MIPS code for the inner loop Chapter 1 — Computer Abstractions and Technology — 22
New Sort: Inner Loop for (pj= pi-1; pj>= v && swap(pj); pj--) {} addi $s3, $s2, -4 # pj = pi-1 j for2_tst for2_loop: addi $s3, $s3, -4 # pj-- for2_tst: slt $t0, $s3, $s0 # pj < v? bne $t0, $zero,for2_exit # yes? exit add $a0, $s3, $zero # $a0 = pj jal swap # swap(pj) bne $v0, $zero,for2_loop # ret 1? cont for2_exit: Chapter 1 — Computer Abstractions and Technology — 23
Lab Mini-Projects • You will use the sorting code to test your CPU design in the lab mini-projects • Use the new sorting code • The new code is more optimized • It will simplify the debugging Chapter 1 — Computer Abstractions and Technology — 24
FP Instructions in MIPS Reading: Textbook Ch. 3.5 and B-71 – B80 • FP hardware is coprocessor 1 • Adjunct processor that extends the ISA • Separate FP registers • 32 single-precision: $f0, $f1, … $f31 • Paired for double-precision: $f0/$f1, $f2/$f3, … • Release 2 of MIPS ISA supports 32 × 64-bit FP reg’s Chapter 3 — Arithmetic for Computers — 25
FP Instructions in MIPS • FP instructions operate only on FP registers • Programs generally don’t do integer ops on FP data, or vice versa • More registers with minimal code-size impact Chapter 1 — Computer Abstractions and Technology — 26
FP Instructions in MIPS • FP load and store instructions • lwc1, ldc1, swc1, sdc1 • e.g., ldc1 $f8, 32($sp) • lwc1, swc1: Load/store single-precision • ldc1, swc1: Load/store double-precision Chapter 1 — Computer Abstractions and Technology — 27
FP Instructions in MIPS • Single-precision arithmetic • add.s, sub.s, mul.s, div.s • e.g., add.s $f0, $f1, $f6 • Double-precision arithmetic • add.d, sub.d, mul.d, div.d • e.g., mul.d $f4, $f4, $f6 Chapter 3 — Arithmetic for Computers — 28
FP Instructions in MIPS • Single- and double-precision comparison • c.xx.s, c.xx.d (xx is eq, lt, le, …) • Sets or clears FP condition-code bit • e.g. c.lt.s $f3, $f4 • Branch on FP condition code true or false • bc1t, bc1f • e.g., bc1t TargetLabel Chapter 1 — Computer Abstractions and Technology — 29
MIPS Call Convention: FP • The first two FP parameters in registers • 1st parameter in $f12 or $f12:$f13 • A double-precision parameter takes two registers • 2nd FP parameter in $f14or $f14:$f15 • Extra parameters in stack • $f0 stores single-precision FP return value • $f0:$f1 stores double-precision FP return value • $f0-$f19 are FP temporary registers • $f20-$f31 are FP saved temporary registers Chapter 1 — Computer Abstractions and Technology — 30
FP Example: °F to °C • C code: float f2c (float fahr) { return ((5.0/9.0) * (fahr - 32.0));} • fahr in $f12, result in $f0 • Assume literals in global memory space, e.g. const5 for 5.0 and const9 for 9.0 • Can FP immediate be encoded in MIPS instructions? Chapter 3 — Arithmetic for Computers — 31
FP Example: °F to °C • Compiled MIPS code: f2c: lwc1 $f16, const5($gp)lwc1 $f18, const9($gp)div.s $f16, $f16, $f18 lwc1 $f18, const32($gp)sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra Chapter 1 — Computer Abstractions and Technology — 32
FP Example: Function Call extern float fahr, cel; cel = f2c(fahr); Assume fahris at 100($gp), celis at 104($gp) lwc1 $f12, 100($gp) # load 1stpara jal f2c swcl $f0, 104($gp); # save ret val Chapter 1 — Computer Abstractions and Technology — 33
FP Example: Max double max(double x, double y) { return (x > y) ? x : y; } max: c.lt.d $f14, $f12 # y < x? bc1f else # if false, do else mov.d $f0, $f12 # $f0:$f1 = x jr $ra else: mov.d $f0, $f14 # $f0:$f1 = y jr $ra Chapter 1 — Computer Abstractions and Technology — 34
FP Example: Max • How to call max? • Assume a, b, c at 100($gp), 108($gp), and 116($gp) extern double a, b, c; c = max(a, b); ldc1 $f12, 100($gp) # $f12:$f13 = a ldc1 $f14, 108($gp) # $f14:$f15 = b jal max sdc1 $f0, 116($gp) # c = $f0:$f1 Chapter 1 — Computer Abstractions and Technology — 35
FP Example: Search Value int search(double X[], int size, double value) { for (inti = 0; i < size; i++) if (X[i] == value) return 1; return 0; } Note 1: There are integer and FP parameters, and the return value is integer Note 2: A real program may search a value in a range, e.g. [value - delta, value + delta] Chapter 1 — Computer Abstractions and Technology — 36
FP Example: Search Value search: add $t0, $zero, $zero # i = 0 j for_cond for_loop: sll $t1, $t0, 3 # $t1 = 8*i add $t1, $a0, $t1 # $t1 = &X[i] lwc1 $f2, 0($t1) # $f2 = X[i] c.eq.d $f2, $f12 # X[i] == value? bc1f endif # if false, skip addi $v0, $zero, 1 # $v0 = 1 jr $ra # return endif: addi $t0, $t0, 1 # i++ for_cond: slt $t1, $t0, $a1 # i < size? bne $t1, $zero, for_loop # repeat if true add $v0, $zero, $zero # to return 0 jr $ra Chapter 1 — Computer Abstractions and Technology — 37
FP Example: Array Multiplication • X = X + Y × Z • All 32 × 32 matrices, 64-bit double-precision elements • C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j];} • Addresses of x, y, z in $a0, $a1, $a2, andi, j, k in $s0, $s1, $s2 Chapter 3 — Arithmetic for Computers — 38
FP Example: Array Multiplication • MIPS code: li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loopL1: li $s1, 0 # j = 0; restart 2nd for loopL2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)addu $t2, $t2, $s1 # $t2 = i * size(row) + jsll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j] … Chapter 3 — Arithmetic for Computers — 39
FP Example: Array Multiplication …sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1 Chapter 3 — Arithmetic for Computers — 40
ARM & MIPS Similarities • ARM: the most popular embedded core • Similar basic set of instructions to MIPS §2.16 Real Stuff: ARM Instructions Chapter 2 — Instructions: Language of the Computer — 41
Compare and Branch in ARM • Uses condition codes for result of an arithmetic/logical instruction • Negative, zero, carry, overflow • Compare instructions to set condition codes without keeping the result • Each instruction can be conditional • Top 4 bits of instruction word: condition value • Can avoid branches over single instructions Chapter 2 — Instructions: Language of the Computer — 42
Instruction Encoding Chapter 2 — Instructions: Language of the Computer — 43
The Intel x86 ISA • Evolution with backward compatibility • 8080 (1974): 8-bit microprocessor • Accumulator, plus 3 index-register pairs • 8086 (1978): 16-bit extension to 8080 • Complex instruction set (CISC) • 8087 (1980): floating-point coprocessor • Adds FP instructions and register stack • 80286 (1982): 24-bit addresses, MMU • Segmented memory mapping and protection • 80386 (1985): 32-bit extension (now IA-32) • Additional addressing modes and operations • Paged memory mapping as well as segments §2.17 Real Stuff: x86 Instructions Chapter 2 — Instructions: Language of the Computer — 44
The Intel x86 ISA • Further evolution… • i486 (1989): pipelined, on-chip caches and FPU • Compatible competitors: AMD, Cyrix, … • Pentium (1993): superscalar, 64-bit datapath • Later versions added MMX (Multi-Media eXtension) instructions • The infamous FDIV bug • Pentium Pro (1995), Pentium II (1997) • New microarchitecture (see Colwell, The Pentium Chronicles) • Pentium III (1999) • Added SSE (Streaming SIMD Extensions) and associated registers • Pentium 4 (2001) • New microarchitecture • Added SSE2 instructions Chapter 2 — Instructions: Language of the Computer — 45
The Intel x86 ISA • And further… • AMD64 (2003): extended architecture to 64 bits • EM64T – Extended Memory 64 Technology (2004) • AMD64 adopted by Intel (with refinements) • Added SSE3 instructions • Intel Core (2006) • Added SSE4 instructions, virtual machine support • AMD64 (announced 2007): SSE5 instructions • Intel declined to follow, instead… • Advanced Vector Extension (announced 2008) • Longer SSE registers, more instructions • If Intel didn’t extend with compatibility, its competitors would! • Technical elegance ≠ market success Chapter 2 — Instructions: Language of the Computer — 46
Basic x86 Registers Chapter 2 — Instructions: Language of the Computer — 47
Basic x86 Addressing Modes • Two operands per instruction • Memory addressing modes • Address in register • Address = Rbase + displacement • Address = Rbase + 2scale× Rindex (scale = 0, 1, 2, or 3) • Address = Rbase + 2scale× Rindex + displacement Chapter 2 — Instructions: Language of the Computer — 48
x86 Instruction Encoding • Variable length encoding • Postfix bytes specify addressing mode • Prefix bytes modify operation • Operand length, repetition, locking, … Chapter 2 — Instructions: Language of the Computer — 49
Implementing IA-32 • Complex instruction set makes implementation difficult • Hardware translates instructions to simpler microoperations • Simple instructions: 1–1 • Complex instructions: 1–many • Microengine similar to RISC • Market share makes this economically viable • Comparable performance to RISC • Compilers avoid complex instructions Chapter 2 — Instructions: Language of the Computer — 50