1 / 27

Closely-Coupled Timing-Directed Partitioning in HAsim

Closely-Coupled Timing-Directed Partitioning in HAsim. Michael Pellauer † pellauer@csail.mit.edu. Murali Vijayaraghavan † , Michael Adler ‡ , Arvind † , Joel Emer †‡. † MIT CS and AI Lab Computation Structures Group. ‡ Intel Corporation VSSAD Group. To Appear In: ISPASS 2008. Motivation.

denise
Download Presentation

Closely-Coupled Timing-Directed Partitioning in HAsim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Closely-CoupledTiming-Directed Partitioningin HAsim Michael Pellauer† pellauer@csail.mit.edu Murali Vijayaraghavan†, Michael Adler‡, Arvind†, Joel Emer†‡ †MIT CS and AI Lab Computation Structures Group ‡Intel Corporation VSSAD Group To Appear In: ISPASS 2008

  2. Motivation • We want to simulate target platforms quickly • We also want to construct simulators quickly • Partitioned simulators are a known technique from traditional performance models: • Micro-architecture • Resource contention • Dependencies • ISA • Off-chip • communication Functional Partition Timing Partition Interaction • Simplifies timing model • Amortize functional model design effort over many models • Functional Partition can be extremely FPGA-optimized

  3. Different Partitioning Schemes • As categorized by Mauer, Hill and Wood: • Source: [MAUER 2002], ACM SIGMETRICS • We believe that a timing-directed solution will ultimately lead to the best performance • Both partitions upon the FPGA

  4. Functional Partition in Software Asim • Get Instruction (at a given Address) • Get Dependencies • Get Instruction Results • Read Memory* • Speculatively Write Memory* (locally visible) • Commit or Abort instruction • Write Memory* (globally visible) * Optional depending on instruction type

  5. F D X C F D X R C F D X W C W F D X R A F D X X C W Execution in Phases The Emer Assertion: All data dependencies can be represented via these phases

  6. Detailed Example: 3 Different Timing Models • Executing the same instruction sequence:

  7. Functional Partition in Hardware? • Requirements • Support these operations in hardware • Allow for out-of-order execution, speculation, rollback • Challenges • Minimize operation execution times • Pipeline wherever possible • Tradeoff between BRAM/multiport RAMs • Race conditions due to extreme parallelism

  8. Functional Partition As Pipeline • Conveys concept well, but poor performance Timing Model Token Gen Fet Dec Exe Mem LCom GCom Functional Partition Memory State Register State RegFile

  9. Implementation:Large Scoreboards in BRAM • Series of tables in BRAM • Store information about each in-flight instruction • Tables are indexed by “token” • Also used by the timing partition to refer to each instruction • New operation “getToken” to allocate a space in the tables

  10. Implementing the Operations • See paper for details (also extra slides)

  11. Assessment:Three Timing Models • Unpipelined Target • MIPS R10K-like out-of-order superscalar 5-Stage Pipeline

  12. Assessment:Target Performance • Targets have idealized memory hierarchy

  13. Assessment:Simulator Performance • Some correspondence between target and functional partition is very helpful

  14. Assessment:Reuse and Physical Stats • Where is functionality implemented: • FPGA usage: Virtex IIPro 70 Using ISE 8.1i

  15. Func Reg + Datapath Timing Model C Func Reg + Datapath Timing Model D Future Work:Simulating Multicores Interaction occurs here • Scheme 1: Duplicate both partitions • Scheme 2: Cluster Timing Parititions Timing Model A Func Reg + Datapath Functional Memory State Timing Model B Func Reg + Datapath Use a context ID to reference all state lookups Timing Model A Timing Model C Functional Reg State + Datapath Timing Model B Timing Model D Functional Memory State Interaction still occurs here

  16. Future Work: Simulating Multicores • Scheme 3: Perform multiplexing of timing models themselves • Leverage HASim A-Ports in Timing Model • Out of scope of today’s talk Timing Model A Timing Model B Timing Model C Functional Reg State + Datapath Timing Model D Functional Memory State Use a context ID to reference all state lookups Interaction still occurs here

  17. Future Work:Unifying with the UT-FAST model • UT-FAST is Functional-First • This can be unified into Timing-Directed • Just do “execute-at-fetch” Func Partition Timing Partition functional emulator running in software execution stream FPGA resteer Emulator execution stream resteer Ø Ø functional emulator running in software Ø Ø

  18. Summary • Described a scheme for closely-coupled timing-directed partitioning • Both partitions are suitable for on-FPGA implementation • Demonstrated such a scheme’s benefits: • Very Good Reuse, Very Good Area/Clock Speed • Good FPGA-to-Model Cycle Ratio: • Caveat: Assuming some correspondence between timing model and functional partitions (recall the unpipelined target) • We plan to extend this using contexts for hardware multiplexing [Chung 07] • Future: rare complex operations (such as syscalls) could be done in software using virtual channels

  19. Questions? pellauer@csail.mit.edu

  20. Extra Slides pellauer@csail.mit.edu

  21. Functional Partition Fetch

  22. Functional Partition Decode

  23. Functional Partition Execute

  24. Functional Partition Back End

  25. Timing Model: Unpipelined

  26. 5-Stage Pipeline Timing Model

  27. Out-Of-Order Superscalar Timing Model

More Related