1 / 15

Timing Model of a Superscalar O-o-O processor in HAsim Framework

Timing Model of a Superscalar O-o-O processor in HAsim Framework. Murali Vijayaraghavan. What is HAsim. Framework to write software-like timing models and run it on FPGAs Software timing models are inherently sequential – hence slow

thanos
Download Presentation

Timing Model of a Superscalar O-o-O processor in HAsim Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timing Model of a Superscalar O-o-O processor in HAsim Framework Murali Vijayaraghavan

  2. What is HAsim • Framework to write software-like timing models and run it on FPGAs • Software timing models are inherently sequential – hence slow • Parallelism is achieved by implementing the timing model on FPGAs

  3. HAsim contd. • Functional partition == ISA • Timing partition == micro-architecture Functional Partition correct execution (multiply, divide, etc) Timing Partition model time (stalls, mispredicts, etc) requests responses

  4. Functional Partition TOK GEN FET DEC EXE MEM LCO GCO FetAlg DecAlg ExeAlg MemAlg LCOAlg GCOAlg RegState MemState

  5. Model cycle vs FPGA cycle • Functional simulator can take any number of FPGA cycles for an operation • So there must be an explicit mechanism to monitor the ticks of the processor being modelled

  6. APorts – monitoring ticks • Each module in timing partition is connected with each other using APorts • A clock tick conceptually begins when the module has read from every input APort and ends when it writes to every output APort • But the tick localized to each port

  7. MIPS R10-k specs • 64-bit processor • Out-of-order execution • Superscalar • FetchWidth – 4 • CommitWidth – 4 • 2 ALUs • 1 Load/Store unit • 1 FPU

  8. Timing model design • Functional partition operates only on one instruction at a time • But timing model time-multiplexes multiple operations to operate on more than one instruction at a time

  9. Timing Model top-level design Exec Results Free Buffer IntQ buffer left PC at Mispredict FU Ops Fetch Decode/Dispath AddrQ buffer left Issue Execute Predicted PC 4 Issue 4 Tokens 4 Commit Commit Mem Exec Token Fetch Decode LCO GCO

  10. Decode/Dispath Module 4 Commit PC at Mispredict Branch/JR Pred ROB Update from exec Predicted PC Update Busy RegFile Insert Decode IntQ Free Count Inst Buffer (8) 4 Inst AddrQ Free Count 4 issue

  11. Issue Module To 2 ALUs IntQ (O-o-O) ScoreBoard 4 Inst AddrQ (In Order) To Load Store

  12. Differences of my timing model from R-10k • SMIPS ISA – no floating point ops • 32-bit registers and addressing • No delay slot • One extra cycle in branch mispredict • JR and JALR has to go through the Integer Q

  13. Reasons for timing differences • Currently functional partition gives only information about branches. So JR and JALR’s address can be got only after execution of JR or JALR • I didn’t implement the branch cache which eliminates the extra cycle in branch mispredict

  14. Simulation results • Simulated SMIPS v2 ADDUI test case • Took 239 FPGA cycles to simulate 7 model cycles – must look into this number as the “bottleneck” is the instruction queue, which takes 7 * 21 cycles = 147 cycles

  15. Miscellaneous • Lines of code for timing model ~ 1300 • Compared to ~1200 for a simple SMIPS processor in Lab2, excluding caches

More Related