1 / 18

Super-Scalar MIPS Processor

Super-Scalar MIPS Processor. With predicted branches/jumps and write back cache. Outline. Enhanced Super Scalar datapath 256-bit Memory bus and write back cache Branch/Jump Prediction Performance Testing. Super-Scalar Datapath. Duplicated original 5-stage pipeline

wray
Download Presentation

Super-Scalar MIPS Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Super-Scalar MIPS Processor With predicted branches/jumps and write back cache CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  2. Outline • Enhanced Super Scalar datapath • 256-bit Memory bus and write back cache • Branch/Jump Prediction • Performance • Testing CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  3. Super-Scalar Datapath • Duplicated original 5-stage pipeline • Dual ported instruction cache • Dual ported register file • 4 read ports and 2 write ports • 4 times the forwarding logic • 4 times the stall logic CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  4. Enhanced Super-Scalar • Allow memory operations in both pipelines • Requires dual ported write buffer and data cache • Also must protect against special memory hazards CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  5. Upgrades to memory system • Allow two writes to write buffer and two reads from data cache • Rest of hierarchy stays same • Data cache uses dual ported SRAMs • Write buffer made of registers • Easily extensible CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  6. New Memory Hazards • Lw/sw in parallel to same address • Stall sw via write buffer until lw completes • Cache miss is possible • Sw/lw in parallel to same address • No problem because sw goes to WB, lw checks there first • Sw/sw in parallel to same address • Sw in pipe A is ignored unless… CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  7. Queuing writes in MMIO • If sw/sw in parallel to same hex-LED • Accept both into queue in MMIO module • Pipe A’s sw ahead of pipe B’s • Queue is emptied at rate of 1 store/cycle • Keep track of which is most recent write • In case we read from LEDs • Each LED has own queue CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  8. Write back cache • Product of complete rewrite • Expanded all memory buses to 8 words • Allows for burst reads and writes • No need for asynchronous FIFO • Burst writes works well w/ write back • Write buffer no longer talks to arbiter, but only to cache CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  9. Diagram of Write back cache writeKick readKick writeMiss readMiss Idle writeReady readDone writeHit readRetry writeKickBack CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  10. Branch/Jump prediction • Local prediction of target based on instruction’s address • Same hardware for branches and jumps • Cache all jumps (not just jr) • Predictor replaces signal coming from ID in old 5-stage pipeline (single pipe) CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  11. Branch Target Table • 256 entries, 58 bits wide: • Valid bit [57] • Tag [56:35] • Target [34:3] • Branch Confidence [2:1] • Jump Confidence [0] CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  12. Using Branch Target Table • Combination of Translation buffer and history table in one SRAM • Table is always written following prediction • Data determined by outputs of branch hardware in ID CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  13. Overall Prediction Logic • Look up PC in parallel with IF • If found then predict based on stored target and confidence bits • Else return PC+8 • On misprediction • Determined by comparing values from ID with values last predicted • Update table with new confidence bits/tag CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  14. Prediction Logic/State • Standard 2 bit branch prediction • 2 states predict taken, 2 not taken • 1 bit jump prediction • Always taken, so bit is only confidence • Used only for updating table CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  15. Branch in pipe B? • Branch hardware (prediction and actual) only works for pipe A • If branch decoded in pipe B, IF is flushed and branch refetched in pipe A • Handled by hazard/issue unit • Cheaper hardware, but takes more cycles compared to moving instructions via MUX CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  16. Branch/Jump performance • Pipe A, Predicted correctly: no penalty • Pipe A, mispredicted: 2 NOPs • Pipe B, Predicted correctly: 3 NOPs • Pipe B, mispredicted: 5 NOPs • Expensive, but result is less MUXes in critical path CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  17. Overall Performance • Current Status • Passed test0/Infinite Loop, test1/basic.v • Passed test3 with cycle count of X190 • Hardware Statistics • Number of Block RAMS: 67% • Number of Slices: 51% • Timing Statistics • Critical Path ~125ns • Clock Frequency 8MHz CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

  18. Testing • Integrated Testing • Ability to run instructions from boot ROM • Gradually expand set of instructions as new features are added/supported • Eventually use proper boot ROM to load instructions To DRAM • Assembly test code • Basic programs to test functionality • Simple branch, arithmetic, and memory operations • Complex programs to test overall operation • Fibonacci, Radix Sort • Used to dig up more corner cases CS152 Fall 2003; N. Sadhal, I. Koulchine, V. Venkatesh, A. Lee

More Related