310 likes | 431 Views
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Exam 1 Review. Dr. Zhao Zhang Iowa State University. What We H ave Learned. Ch. 1: Computer Abstraction and Technology Technology Trends CPU Performance Instruction count, CPI, and cycle time
E N D
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Exam 1 Review Dr. Zhao Zhang Iowa State University
What We Have Learned • Ch. 1: Computer Abstraction and Technology • Technology Trends • CPU Performance • Instruction count, CPI, and cycle time • Processor power efficiency • Processor manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 2
Question Styles and Coverage • Short conceptual questions • Calculation questions • Performance improvement (speedup) • Power rate and energy saving • CPU time, CPI, Instruction Count, Cycle Time CPU time = # Cycles × CT = IC × CPI × CT Speedup = Old Time / New Time • The coverage excludes • Manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 3
Question 1 • A MIPS processor runs at 1.0GHz, and for a given benchmark program its CPI is 1.5. A design optimization will improve the clock rate to 1.5GHz and increase the CPI to 1.8. What is the speedup from the optimization? Instruction count remains the same Clock rate change: 1.5/1.0 = 1.5x Cycle time improvement factor is 1.50x CPI change: 1.8/1.5 = 1.2x Improvement factor is 0.83x (degradation) Overall performance improvement is 1.50*0.83 = 1.25x Chapter 1 — Computer Abstractions and Technology — 4
Question 2 A processor spends 60% time on load/store instructions. A new design improve load/store performance by 2.0 times. What is the overall performance improvement? Amdahl’s Law: Speedup = 1/((1-f)+f/s) f: Fraction of time that the optimization applies to s: The improvement factor of the optimization Speedup = 1/(0.4 + 0.6/2.0) = 1/0.7 = 1.43 Chapter 1 — Computer Abstractions and Technology — 5
What We Have Learned • Ch. 2, Instructions: Language of the Computer • Instruction set architecture • MIPS binary instruction format • Plus floating-point instructions Chapter 1 — Computer Abstractions and Technology — 6
Question 3 Translate the following C statement into MIPS. Variables f, g, h are global and located at 100($gp), 104($gp) and 108($gp), respectively. extern int f, g, h; f = g + 4 * h; Try to predict how many instructions that you have to use Chapter 1 — Computer Abstractions and Technology — 7
Question 3 # Load g, load h, multiply, add, store lw $t0, 104($gp) # load g lw $t1, 108($gp) # load h sll $t1, $t1, 2 # 4*h add $t0, $t0, $t1 # g+4*h sw $t0, 100($gp) # store f Chapter 1 — Computer Abstractions and Technology — 8
Exam Strategy In your exam, write comments with the MIPS code • It helps you write the code • It helps the grader understand your code • You may get more partial credit • In case your code is not 100% correct Chapter 1 — Computer Abstractions and Technology — 9
Load and Store • Three factors: address, size and extension • Load/store word: lw, sw • Half word: lh, lhu, sh • Byte: lb, lbu, sb • Choose sign extension or zero extension, when loading a half word or a byte • Floating points load and store • Single precision: lwc1, swc1 • Double precision: ldc1, sdc1 Chapter 1 — Computer Abstractions and Technology — 10
Array access • Load from an array element extern unsigned short X[]; h = X[i]; Assume h in $s2, X in $s0, i in $s1. sll $t0, $s1, 1 # $t0=i*2 add $t0, $s0, $t0 # $t0=&X[i] lhu $s2, 0($t0) # h=X[i] Chapter 1 — Computer Abstractions and Technology — 11
Array Access • Store to an array element extern intY[]; Y[j] = g; Assume g in $s2, Y in $s0, j in $s1. sll $t0, $s1, 2 # $t0=j*4 add $t0, $s0, $t0 # $t0=&Y[j] sw $s2, 0($t0) # Y[j]=g Chapter 1 — Computer Abstractions and Technology — 12
Array Access • Load and store floating point numbers extern double X[], Y[]; Y[i] = X[i]; Assume i in $s0, X in $a0, j in $a1 sll $t0, $s0, 3 # $t0=8*i add $t0, $a0, $t0 # $t0=&X[i] ldc1 $f0, 0($t0) # $f0:f1=X[i] add $t1, $a1, $t0 # $t1=&Y[i] sdc1 $f0, 0($t1) # $f0:f1=Y[i] Chapter 1 — Computer Abstractions and Technology — 13
16-bit and 32-bit Constants • Load a 16-bit immediate f = 0x1000; // f in $s0 addi $s0, 0x1000 • Load an 32-bit immediate f = 0xFFFF1000; lui $s0, 0xFFFF ori $s0, $s0, 0x1000 Chapter 1 — Computer Abstractions and Technology — 14
Pointer Access • Pointer access int h, *p; Assume h in $t0, p in $s0. h = *p; lw $t0, 0($s0) # h = *p *p = h; sw $t0, 0($s0) # h = *p Chapter 1 — Computer Abstractions and Technology — 15
Branches • Only two branches in the original MIPS beqrs, rt, label bners, rt, label • Branch if true/non-zero bners, $zero, label • Branch if false/zero beqrs, $zero, label Chapter 1 — Computer Abstractions and Technology — 16
If-else Statement • Evaluate condition, branch if false if (a < 0) a = -a; Assume a in $s0 slt $t0, $s0, $zero # a < 0? beqendif # false? skip sub $s0, $zero, $s0 # a = -a endif: Chapter 1 — Computer Abstractions and Technology — 17
If-else Structure • Evaluate condition, branch if false if (a > b) max = a; else max = b; Assume max in $s2, a in $s0, b in $s1 slt $t0, $s1, $s0 # b < a beq $t0, $zero, else # false? add $s2, $s0, $zero # max = a j endif else: add $s2, $s1, $zero # max = b endif: Chapter 1 — Computer Abstractions and Technology — 18
FOR Loop Control and Data Flow Graph Linear Code Layout (Optional: prologue and epilogue) Init-expr Init-expr Jump For-body For-body Incr-expr Incr-expr Test cond Cond Branch if true T F
Function with For-loop Translate the following C function into MIPS short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = checksum ^ X[i]; return checksum; } Chapter 1 — Computer Abstractions and Technology — 20
Function with For-loop checksum: # X=>$a0, N=>$a1, i=>$t0, # checksum=>$v0 addi$v0, $zero, 0 # checksum = 0 addi $t0, $zero, 0 # i = 0 j loop_cond loop: sll $t1, $t0, 1 # i*2 add $t1, $a0, $t1 # &X[i] lh $t1, 0($t1) # load X[i] xor $v0, $v0, $t1 # checksum ^= X[i] addi $t0, $t0, 1 # i++ loop_cond: slt $t1, $t0, $a1 # i < N bne $t1, $zero, loop # loop jr $ra Chapter 1 — Computer Abstractions and Technology — 21
Leaf and Non-Leaf Functions • Leaf function doesn’t call another function • Stack frame is not necessary • Prefer to use temp registers (t-registers) • Non-leaf function calls some other functions(s) • Must use a stack frame, has to save $ra • Usually has to use save registers (s-registers) Chapter 1 — Computer Abstractions and Technology — 22
Non-Leaf Function What is the size of the frame? extern short xor(short, short); short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = xor(checksum, X[i]); return checksum; } Chapter 1 — Computer Abstractions and Technology — 23
Non-Leaf Function • X, N, i, and $ra must be preserved • Need a stack frame of 16 bytes addi $sp, $sp, -16 sw $ra, 12($sp) # for return address sw $s2, 8($sp) sw $s1, 4($sp) sw $s0, 0($sp) add $s0, $a0, $zero # $s0 = X add $s1, $a1, $zero # $s1 = N addi $s2, $zero, 0 # i = 0 Chapter 1 — Computer Abstractions and Technology — 24
Non-Leaf Function … # function body lw $s0, 0($sp) lw $s1, 4($sp) lw $s2, 8($sp) lw $ra, 12($sp) addi $sp, $sp, 16 jr $ra Chapter 1 — Computer Abstractions and Technology — 25
Register Name and Call Convention 6 24 6 Chapter 1 — Computer Abstractions and Technology — 26
MIPS Call Convention: FP • The first two FP parameters in registers • 1st parameter in $f12 or $f12:$f13 • A double-precision parameter takes two registers • 2nd FP parameter in $f14or $f14:$f15 • Extra parameters in stack • $f0 stores single-precision FP return value • $f0:$f1 stores double-precision FP return value • $f0-$f19 are FP temporary registers • $f20-$f31 are FP saved temporary registers Chapter 1 — Computer Abstractions and Technology — 27
FP Example: Call a Function extern double a, b, c; extern double max(double, double); c = max(a, b); ldc1 $f12, 100($gp) # $f12:$f13 = a ldc1 $f14, 108($gp) # $f14:$f15 = b jal max sdc1 $f0, 116($gp) # c = $f0:$f1 • Assume a, b, c assigned to 100($gp), 108($gp), and 116($gp) Chapter 1 — Computer Abstractions and Technology — 28
FP Instructions in MIPS • Single-precision arithmetic • add.s, sub.s, mul.s, div.s • e.g., add.s $f0, $f1, $f6 • Double-precision arithmetic • add.d, sub.d, mul.d, div.d • e.g., mul.d $f4, $f4, $f6 Chapter 3 — Arithmetic for Computers — 29
FP Instructions in MIPS • Single- and double-precision comparison • c.xx.s, c.xx.d (xx is eq, lt, le, …) • Sets or clears FP condition-code bit • e.g. c.lt.s $f3, $f4 • Branch on FP condition code true or false • bc1t, bc1f • e.g., bc1t TargetLabel Chapter 1 — Computer Abstractions and Technology — 30