190 likes | 359 Views
Test 1 Postmortem. CSCE 513 Computer Architecture. Readings: Chapter 1 Appendix C Appendix B Chapter 2. September 30, 2013. Test 1 Fall 2012 -. Short Answer Performance (Amdahl’s) Classic 5 Stage pipeline AMAT Forwarding Unroll loop TakeHome Tomasulo’s. What might be covered!
E N D
Test 1 Postmortem CSCE 513 Computer Architecture • Readings: • Chapter 1 • Appendix C • Appendix B • Chapter 2 September 30, 2013
Test 1 Fall 2012 - • Short Answer • Performance (Amdahl’s) • Classic 5 Stage pipeline • AMAT • Forwarding • Unroll loop TakeHome • Tomasulo’s • What might be covered! • IEEE 754 • Branch handling • Moving adder to ID stage • Energy/Power • Power Wall • RAW, WAR, WAW • …
1. a) What does CPI mean? and what is the best possible CPI for a simple scalar processor? • (b) In making cache blocks larger how does that effect cache performance, be particular about what improves and what might be negatively affected? • (c) In making small simple Ll caches how does that affect cache performance?
(d) What is meant by write merging? • (e) What is the misquote of Moore's law that held during the 1980's and 90's? • (f) How does a tournament branch predictor work • (g) Explain critical word first and early restart. • (h) What is stored in an entry of the TLB?
2. Performance (a) Suppose the percentage of time an enhancement can be applied is 25% and suppose that the enhancement improves the performance by a factor of 2. What is the overall Speedup? • (b) Suppose there are two improvements A, and B that are applicable applicable 10% and 40% and with speedups 10, and 4 respectively. Assume that A and B do not overlap, What is the ExecTimenewexpressed in terms of ExecTimeorig?
(c) What percentage of the new-improved time is none of the improvements in use? • (d) Assuming only one of A and B can be done which is better?
3. Classical5 stage pipeline: Assuming the classical5-stage pipeline with no forwarding at all not even through the registers. • Assume all of these instructions execute in 1 cycle. Given the code below: • loop: • DADDIU R2, R2, +8 • LD.D F4, 0(R2) MULT F8, F4, F4 • ADD.D FS, FS, FB • SUB Rl, R3, R2 • BNEZ Rl, loop • (a) Show how the first iteration of the loop would proceed through the pipeline. Stop with the fetch of the first instruction of the loop on the second iteration or when you fill the table. Assume you predict branch taken.
(b) If the loop executes 1000 times how many cycles does it take? • (c) If you predict branch not taken how many cycles does 1000 iterations take?
(d) If you do full forwarding and predict branch taken how many cycles does 1000 iterations take. Figure out how many stalls you elimnate in one iteration and then proceed.
4. (Average memory access time) • (a) Assume • • the HitTime to the Ll-cache is I ns, • • the MissR.ate to the L1 is 10%, • • the HitTime to the L2 cache is IOns, • • the MissRate to the L2 is 25%, • • the MissPenalty for the L2 cache is 1OOns. • Then what is the average memory access time (AMAT)?
(b) Given the table below which is only a portion of cache satisfying • • memory is byte addressable. • • memory addresses are to I-byte words • • 4-way associativity • • Block size 2B (ridiculously small) • • total cache size 1024 = 1KB • • physical addresses 14 bits wide
i. How Many lines are there? • ii. How many sets? • iii. How big are the block offset, set index and tag fields? • iv. Is 0x3A0F a hit or miss? • v. if it is a hit what data is returned?
5. Explain how the forwarding shown in the diagram would be • (a) Detected that it should be done • (b) Give an example of a code that would make this type of forwarding occur
Assuming latencies Integer operations, branches require 1 cycle for execution loop: LD FO, 0(R1) MUL.D F4, FOI FO ADD.D FB, FB, F4 S.D FB, O(Rl) ADD.D F4, FO, FO ADD.D FO, F4, FO S.D FO, 1024(R1) DADDIU R1, R1, +8 BNE R1, R2, Loop • (b) Unroll this loop once and schedule the code to eliminate as many stalls as possible? • (c) What do you need to do (or more accurately what does the compiler need to do) to allow your unrolled loop to work if the original loop executes an odd number of times?