1 / 41

COSC 3330/6308 Solutions to Second Problem Set

COSC 3330/6308 Solutions to Second Problem Set. Jehan-François Pâris October 2012. First problem. Detail for each of the four following MIPS instructions, which actions are being taken at each of their five steps.

theo
Download Presentation

COSC 3330/6308 Solutions to Second Problem Set

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COSC 3330/6308Solutions toSecond Problem Set Jehan-François PârisOctober 2012

  2. First problem • Detail for each of the four following MIPS instructions, which actions are being taken at each of their five steps. • Do not forget to mention how and during which steps each instruction updates the program counter. (4×10 points).

  3. jalr $s0, $s1 • Fetch instruction and add 4 to PC • Read $s0 and save "somewhere" current value of PC • Transmit value of $s0 to PC either directly or through adder • This data path does not exist in the toy MIPS architecture we studied • Write saved PC value into register $s1 • This data path does not exist in the toy MIPS architecture we studied

  4. jalr $s0, $s1 (other good answer) • This instruction cannot be implemented on the toy MIPS architecture because • There is no data path going from a read register line to the PC • Cannot set new value of PC to contents of $S0 • There is no data path going from the PC to the write register line • Cannot save "old" value of PC into register $S1

  5. sw $s1, 24($t0) • Fetch instruction and add 4 to PC • Read registers $s1and $t0 andsign extend contents of displacementfield of instruction • Compute memory address a by adding contents of register $t0 to sign-extended displacement • Store contents of register$s1 into memory address a

  6. slt $t0, $s3, $s4 • Fetch instruction and add 4 to PC • Read registers $s3and $s4 • Compare values of $s3and $s4 using ALU • Store comparison result into register$t

  7. jal 1048576 • Fetch instruction and add 4 to PC • Sign extend contents of displacementfield of instruction • Save "somewhere" contents of PC • Multiply by four sign-extended contents of displacementfield of instruction and replace 28 LSB of PC with new value • Write saved PC value into register $31 • This data path does not exist in the toy MIPS architecture we studied

  8. jal 1048576 (other good answer) • This instruction cannot be implemented on the toy MIPS architecture because • There is no data path going from the PC to the write register line • Cannot save "old" value of PC into register $S1

  9. Second problem • Consider these two potential additions to the MIPS instruction set and explain how they would restrict pipelining. (2×5 easy points) • cp d1(r1), d2(r2) • incr d2(r2)

  10. cp d1(r1), d2(r2) • Copy contents of word at address contents of r2 plus offset d2 into address contents of r1 plus displacement d1.

  11. Answer (I) • Let us look at the steps the instruction will have to take: • Instruction fetch • Instruction decode and read register r1 • Use arithmetic unit to compute d1+[r1] • Access memory to read word at address d1+[r1]

  12. Answer (II) • And it continues: • Write somewhere the value v • Read register r2 • Use arithmetic unit to compute d2+[r2] • Access memory to write value v at address d2+[r2] • Instruction reads twice a register and accesses twice the ALU

  13. incr d2(r2) • Adds one to the contents of word at address contents of r2 plus offset d2

  14. Answer (I) • Let us look at the steps the instruction will have to take: • Instruction fetch • Instruction decode and read register r2 • Use arithmetic unit to compute d2+[r2] • Store the address somewhere • Access memory to read word at address d2+[r2]

  15. Answer (II) • And it continues: • Use arithmetic unit to increment by 1 value that was just read • Access memory to write value v at address d2+[r2] that was previously saved • Instruction accesses twice the ALU

  16. Third problem • Explain how you would pipeline the four following pairs of statements. (4×5 points)

  17. Part A • add $t0, $s0, $s1beq $s1,$s2, 300 No data hazard!

  18. Part A (with special unit) It can de done as this step uses a different paths than the previous instruction Both solutions will get full credit

  19. Part B • add $t2, $t0, $t1sw $t3, 36($t2) Data hazard is avoided thanks to forwarding unit

  20. Part B (without forwarding unit) • add $t2, $t0, $t1sw $t3, 36($t2) Two cycles are lost(60% CREDIT)

  21. Part C • add $t0, $s0, $s1beq $t0,$s2, 300 Data hazard is avoided thanks to forwarding unit

  22. Part C (with special unit) • add $t0, $s0, $s1beq $t0,$s2, 300 It can de done as this step uses a different paths than the previous instruction

  23. Part C (without forwarding unit) • add $t0, $s0, $s1beq $t0,$s2, 300 Two cycles are lost(60% CREDIT)

  24. Part D • lw $t0, 24($t1)sub $s2, $t0, $t1 Data hazard is reduced by forwarding unit

  25. Part D (without forwarding unit) • lw $t0, 24($t1)sub $s2, $t0, $t1 Three cycles are lost(STILL 60% CREDIT)

  26. Fourth problem • A computer system has a two-level memory cache hierarchy. • L1 cache has a zero hit penalty, a miss penalty of 5 ns and a hit rate of 95 percent • L2 cache has a miss penalty of 100 ns and a hit rate of 90 percent.

  27. Part A • How many cycles are lost by each instruction accessing the memory if the CPU clock rate is 2 GHz?

  28. Answer (I) • Let P1 and P2 be respectively the misspenalties of caches L1 and L2 • Duration of clock cycle 1/(2 GHz) = 0.25×10-9 s = 0.5 ns • Cache miss penalties P1 = 25 = 10 cycles P2 = 2100 = 200 cycles

  29. Answer (II) • Recall P1 = 10 cycles and P2 = 200 cycles • Let M1 and M2 be respectively the missrates of caches L1 and L2 • We have M1 = 0.05 and M2 = 0.10 • Number of lost cycles/instruction M1P1 + M1M2P2 =0.0510 + 0.050.10200 = 0.5 + 1 =1.5

  30. Hint • Use fractions to reduce risk of error

  31. Part B • We can either increase the hit rate of the topmost cache to 98 percent or increase the hit rate of the second cache to 95 percent. • Which improvement would have more impact? (10 points)

  32. Better L1 cache • Recall P1 = 10 cycles and P2 = 200 cycles • We now have M1 = 0.02 and M2 = 0.10 • Number of lost cycles/instruction M1P1 + M1M2P2 =0.0210 + 0.020.10200 = 0.2 + 0.4 = 0.6

  33. Better L2 cache • Recall P1 = 10 cycles and P2 = 200 cycles • We now have M1 = 0.05 and M2 =0.05 • Number of lost cycles/instruction M1P1 + M1M2P2 =0.0510 + 0.050.05200 = 0.5 + 0.5 = 1

  34. Answer • For the values of M1, P1, M2 and P2 we considered • Improving the hit ratio of the L1 cache provides the best speedup

  35. Fifth problem • A virtual memory system has • A virtual address space of 4 Gigabytes • a page size of 8 Kilobytes. • Each page table entry occupies 4 bytes.

  36. Part A • How many bits remain unchanged during the address translation? (5 points)

  37. Answer • How many bits remain unchanged during the address translation? (5 points) • Page size is 8 KB = 23 210 = 213 bytes • The last 13 bits of each address will remain unchanged during the address translation

  38. Part B • How many bits are used for the page number? (5 points)

  39. Answer • How many bits are used for the page number? (5 points) • Address space is 4 GB = 22 230 = 232 bytes • Will have 32-bit addresses • The page number will occupy the32 – 13 = 19 most significant bits of the address

  40. Page number Offset Reminder Virtual address: 32 or 64 bits Used to find right page frame number Copied unmodified Page frame number Offset

  41. Part C • What is the maximum number of page table entries in a page table? (5 points) • Page number occupies 19 bits • Can have 219pages in a process • Page tables will have 219 entries

More Related