220 likes | 341 Views
CPEG323 Homework Review II. Long Chen November, 30 th , 2005. Homework 4. Problem 1: Terminologies Performance: response time & throughput Latency Wall clock time Weighted CPI System time These terminologies may show up in other courses. Homework 4 – cont.
E N D
CPEG323 Homework Review II Long Chen November, 30th, 2005
Homework 4 • Problem 1: Terminologies • Performance: response time & throughput • Latency • Wall clock time • Weighted CPI • System time These terminologies may show up in other courses.
Homework 4 – cont • Problem 2: can lower instruction count increase instruction clock cycle time? • It depends • Simple instructions do less work than complex instructions do • Simple instructions execute faster than complex instructions • Simple instructions -> big code size -> more cache miss, possibly
Homework 4 – cont • Problem 4: Two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions. • P1’s clock rate = 4GHz, P2’s clock rate = 6Ghz Peak performance = the fastest rate that a computer can execute any instruction sequence Peak performances of PI and P2? The average number of cycles for each instruction class
Homework 4 - cont • Clearly, for P1, the ideal instruction sequence is the one composed entirely of class A instructions. • Then, the peak performance of P1 is • (4G cycles/sec) / (1 cycle/instn) = 4000 MIPS • Similarly, the peak performance of P2 is 3000 MIPS, with the instruction sequence composed of class A, B, and/or C.
Homework 5 • Problem 1 • Speedup = (EX time b4 Imp)/(EX time aft Imp) • It takes 100 seconds to complete program P1. Of this time, 15% is used for division, 40% for memory access. If you improve only division, what’s the maximal possible speedup you can achieve? • 1/(1 – 15%) = 117.65%
Homework 5 - cont • Problem 2: • Explain how the instruction “add $t1, $t2, $t3” is being executed in the singlecycle datapath, using the figure 5.19 in your textbook. • Four steps: • Instruction fetch (IF) • Instruction decode and reading registers (ID) • ALU operation (EX) • Write the result into the register file (WB) • You should also be familiar with the multicycle case, pipelined case.
Homework 5 - cont • Problem 3 • Add necessary datapaths the singlecycle datapath shown in the figure 5.17 in the textbook for a new instruction jr (jump register). • Modification: • The datapath to allow the new PC to come from a register (Read data 1 port) • A new control signal (e.g., JumpReg) to control the new PC through a multiplexor
Homework 5 - cont • Problem 4 • Find the hazard and reorder the instructions to avoid pipeline stall S1: lw $t0, 0($t1) S2: lw $t2, 4($t1) S3: sw $t2, 0($t1) S4: sw $t0, 4($t1) • RAW hazard between S2 and S3: the content of $t2 is not available when S2 tries to read $t2. We have to stall the pipeline, even with the help of forwarding. • However, we can reorder the code to avoid it.
Homework 5 - cont • The reordered code: S1: lw $t0, 0($t1) S2: lw $t2, 4($t1) S4: sw $t0, 4($t1) S3: sw $t2, 0($t1) Hazard is solved by a clever arrangement of the instructions, while it still guarantees the correctness. Instruction reorder/scheduling is a common compiler technique.
Homework 5 - cont • Problem 5: • Executing the following code on the pipelined datapath, what registers are being read and written at the end of the fifth cycle of the execution? S1: add $2, $3, $1 S2: sub $4, $3, $5 S3: add $5, $3, $7 S4: add $7, $6, $1 S5: add $8, $2, $6
CC1 IF IF IF IF IF CC2 ID ID ID ID ID EX EX EX CC3 EX EX MEM MEM MEM MEM MEM CC4 WB WB WB WB CC5 WB Homework 5 - cont CC6 CC7 CC8 CC9 S1 S2 S3 S4 S5 S1: add $2, $3, $1 S2: sub $4, $3, $5 S3: add $5, $3, $7 S4: add $7, $6, $1 S5: add $8, $2, $6 So, at the end of the fifth cycle of execution, registers $6 and $1 (of S4) are being read and register $2 (of S1) will be written.
Homework 5 - cont • Problem 6: • How many cycles will it take to execute the code below on the pipelined datapath? S1: lw $4, 100($2) S2: sub $6, $4, $3 S3: add $2, $3, $5 • S2 tries to read a register $4 right following S1, a load instruction that writes the same register • Forwarding cannot help this time
Homework 6 • Problem 1: • How many bits are required to implement a direct-mapped cache with 64KB of data and 4-B blocks, assuming a 32-bit address? • Cache size = 2^16 bytes (64KB) • Block size = 2^2 bytes (4-B) • Number of cache blocks = 2^(16-2) = 2^14 • Each block has 32 bits of data plus a tag, which is (32 - 14 – 2) = 16 bits, plus a valid bit. Thus, the total cache size is • 2^14 * (32 + 16 + 1) = 2^14 * 49 = 784 Kbits
Homework 6 - cont • Problem 3: • Given a direct-mapped cache with 16-word data and 4-word blocks, what’s the cache misses and hits when having a series of memory access by the addresses: 2,4,8,20,18,11,43,17? • First, construct the cache This is a 4-block cache
Homework 6 - cont Read memory by address 2 Which block to look at? The memory block number: word address DIV word per block 2/4 = 0 which maps to cache block number: memory block number MOD # of cache blocks 0 module 4 = 0; Here, we suppose the memory space is 2^8 bytes
Homework 6 - cont Repeat the step until all memory references have been finished. Then, we have a cache with the below content
Homework 6 - cont • Problem 4 • Find the hazards in the code and reorder it to avoid pipeline stall lw $t0, 0($t1) addi $t3, $t0, 4 sw $t3, 0($t1) lw $t2, 4($t1) addi $t4, $t2, 4 sw $t4, 4($t1) • The same thing as the problem 4, homework#5
Homework 6 - cont • Problem 6 • Explain how the instruction “lw $t1, 8($t2)’”is being executed in the pipelined datapath, using the figure 6.17 in the textbook • Five stages: • Instruction fetch (IF); • Instruction decode and register file fetch (ID); • Address calculation (EX); • Memory access (MEM); • Write back (WB); • It is important that you should be able to explain the details of the execution of a given instruction. For example, for “load”, what should be stored in each pipeline registers?