1 / 31

Hasim

† VSSAD Intel. ‡ CSAIL MIT. Hasim. Michael Adler † , Artur Klauser † , Angshuman Parashar † , Michael Pellauer ‡ , Murali Vijayaraghavan ‡. Joel Emer †‡. Overview. Goal Produce compelling evidence for architecture ideas Requirements Cycle accurate simulation

rosina
Download Presentation

Hasim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. †VSSADIntel ‡CSAILMIT Hasim Michael Adler†, Artur Klauser†, Angshuman Parashar†, Michael Pellauer‡, Murali Vijayaraghavan‡ Joel Emer†‡

  2. Overview • Goal • Produce compelling evidence for architecture ideas • Requirements • Cycle accurate simulation • Representative simulation length • Software development (often) • Current approach • Mostly software simulation (10 KHz to 1 KHz) • New approach • Build a performance model in an FPGA Hasim

  3. FPGA-based approaches • Prototyping • Build a logically isomorphic representation of the design • Modeling • Build a performance simulation in gates • Hybrids • Build something that is partially a prototype and partially a model Hasim

  4. Recreate Asim in hardware • Modularity • Inter-module communication • Functional/Timing Partitioning • Modeling Utilities Hasim

  5. Why modularity? • Speed of model development • Shared components between products • Reuse across generations • Encourages isomorphism to design • Improved fidelity • Facilitates speed/fidelity trade-offs • Architectural experimentation • Factorial development and evaluations • Sharing Hasim

  6. C M N F D R X C W B ASIM Module Hierarchy S Hasim

  7. S B C M N B F D R X C W B B B ASIM Module Selection B Hasim

  8. B C M N B F D R X C W B B B B Module Selection S S C M N F D R X C W Hasim

  9. B C M N B B B B Module Replacement S X F D R X C W Hasim

  10. (H)ASIM Module Hierarchy Hasim

  11. F D R X C W N N Communication C Hasim

  12. Named connections S A-out A-in D Hasim

  13. Model and FPGA Cycles Port Port ModuleB Module A Port Port Hasim

  14. Functional/Timing Decomposition • ISA semantics • Platform semantics • Micro-architecture Timing Partition Functional Partition Fetch(PC) … Instruction • Simplifies timing model • Amortize functional model design effort over many models • Can be pipelined for performance • Can be FPGA-friendly design • Can be split across hardware and software Hasim

  15. Execute@execute phases • Fetch instruction • Speculatively execute instruction • Read memory* • Speculatively write memory* (locally visible) • Commit or Abort instruction • Write memory* (globally visible)* Optional depending on instruction type Hasim

  16. F D X C F D X R C F D X W C W F D X R A F D X X C W Execution in phases Assertion: All data dependencies can be represented in these phases Hasim

  17. Token Gen Fet Dec Exe Mem LCom GCom HASim: Partitioning Overview TimingPartition Memory State Register State RegFile Functional Partition Hasim

  18. Common Infrastructure • Modules • Inter-module communication • Statistics gathering • Event logging • Debug Tracing • Simulation control • … Hasim

  19. Bluespec (Asim-style) module module [HAsim_module] mkCache#() (Empty);Port#(Addr) req_port <- mkSendPort(‘a2cache’); Port#(Bool) resp_port <- mkRecvPort(‘cache2a’);    TagArray tagarray <- mkTagArray(); rule cycle(True);     Maybe#(Addr) mx = req_port.get(); if (isValid(mx))     resp_port.put(tagarray.lookup(validValue(mx)));    endruleendmodule Hasim

  20. Bluespec (Asim-style) submodule • module mkTagArray(TagArray); RegFile#(Bit#(12),Bit#(4)) tagArray<- mkRegFileFull(...); method Bool lookup(Bit#(16) a); return (tagArray.sub(getIndex(a)) == getTag(a)); endmethod • function Bit#(4) getTag(Address x); return x[15:12]; endfunction • function Bit#(12) getIndex(Address x); return x[11:0]; endfunction • endmodule Hasim

  21. Support functions - stats module mkCache#(...) (Empty);   ... cache_hits <- mkStat(...); ...    hit=tagarray.lookup(...);    if (hit) cache_hits.increment(); endif ...endmodule Module Stat Counter Module Stat Counter Stat Dumper Module Stat Counter Hasim

  22. 2Dreams Hasim

  23. Support functions - events module mkCache#(...) (Empty);   ... cache_event <- mkEvent(...); ...    hit=tagarray.lookup(...);    cache_event.report(hit); ...endmodule Module Event Reg Module Event Reg Event Dumper Module Event Reg Hasim

  24. Support functions – global controller module mkCache#(...) (Empty);   ... ctrl <- mkCntrlr(...); ... rule (ctrl.run())... endrule endmodule Module Controller Module Controller GlobalController Module Controller Hasim

  25. FPGA-based prototype Prototyping Catch-22… Hasim

  26. M C C F F F D D D R R R X X X C C C W W W Module Instantiation U C M N Hasim

  27. S S RC S C C C C M M M M N N N N SM SC S RC RM SM SC RM Factorial Coding/Experiments Hasim

  28. HAsim: Current status - models • Simple RISC functional model operating • Simple RISC ISA • Pipelined multi-phase instruction execution • Supports speculative OOO design • Physical Reg File and ROB • Small physically addressed memory • Fast speculative rewinds • Instruction-per-cycle (APE) model • Runs simple benchmarks on FPGA • Five stage pipeline • Supports branch mis-speculation • Runs simple benchmarks (in software simulation) • X86 functional model architecture under development Hasim

  29. baz baz bar bar foo foo Connections Implement Ports PM (Module Tree w. Connections) PM (Hardware Modules w. Wrappers) Implemented via connections. Hasim

  30. Timing Model Resources (Fast) • OOO, branch prediction, three functional units, 32KB 2-way set associative ICache and DCache, iTLB, dTLB2142 slices (15% of a 2VP30) • 21 block RAMs (15% of a 2VP30) • Configurable cache model • 32KB 4-way set associative cache with 16B cache-lines • 165 slices (1% of a 2VP30) • 17 block RAMs (12% of a 2VP30) • 2MB 4-way set-associative cache with 64B cache-lines • 140 slices (1% of a 2VP30) • 40 block RAMs (29% of a 2VP30) • Current FPGAs (4VFX140) • 142,128 slices • 552 block RAMs • 2 PowerPCs Hasim

More Related