150 likes | 287 Views
Building a Synthesizable x86. Eriko Nurvitadhi, James C. Hoe, Babak Falsafi {enurvita,jhoe,babak}@ece.cmu.edu. S IMFLEX /P ROTOFLEX. http://www.ece.cmu.edu/~simflex/. Motivation. Build synth x86 func model for prototyping most widely-used ISA Intel won’t give out theirs
E N D
Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi {enurvita,jhoe,babak}@ece.cmu.edu SIMFLEX/PROTOFLEX http://www.ece.cmu.edu/~simflex/
Motivation • Build synth x86 func model for prototyping • most widely-used ISA • Intel won’t give out theirs • Problem: a very complicated ISA • many instructions • 482 instructions total (**ADD has 14 variations) • many individually complicated instructions • PUSHAD – push all GP registers to stack • many under-specified instructions • LOADALL inst; BCD operation flag updates • Also must be maintainable & extensible return on investment
Overcoming Complexity • 4 key ingredients in our approach • working SW simulator as design spec • simplified multi-cycle datapath • high-level HDL • HW-SW co-simulation validation & evaluation • What we have today. . . • an x86 functional model in Bluespec • all real-mode general-purpose insts • includes I/O instructions! • boots FreeDOS OS in co-simulation testbench • synthesizes to 85% of a Virtex II Pro 70 FPGA • Max 10 MIPS (based on synthesis + simulation)
Outline • Introduction • Our Approach • Status and Results • Discussions and Future work
Inst_1 ACT ACT ACT ACT Inst_2 Inst_n ACT ACT beh_1 beh_2 beh_m functional model Functional View of an ISA • ISA = architectural states + instructions • instruction = set of alternate behaviors • e.g., due to different addressing modes • x86 has 482 insts but ~1000 behaviors • behavior = sequence of actions that read & alter states
SW x86 Sim as ISA Spec • Simulator source code = precise and executable design spec • We use Bochs (http://bochs.sourceforge.net/) • open-source • code structure fits our high-level ISA view • i.e., explicit architecture state declaration one instruction behavior C++ function • (Essentially) complete x86 functionalities • simulate complete PC system • run various OSs (e.g., Linux, Win XP) • support 386 through Pentium Pro
Multi-cycle Implementation • Sequential, multi-cycle execution Decode Execute Commit Finish Fetch Start • Top-level view arch, aux states decoder Mem accesses I/O operations FU FU FU FU FU x86 functional model
Bluespec Design Capture • Explicit state declaration • x86 architectural states • auxiliary simulation states used by Bochs • Predicated atomic rules • one rule one action in our ISA view • Maintainability & extensibility • new behavior: add rules • changing behavior: add/modify rules • Optimizations (low-level) • reduce logic: reuse + combine rules • reduce critical path delay: split rules
HW-SW co-simulation for Validation and Evaluation • Virtually “plug-in” our model into a PC • execute Bochs to provide reference behavior • simulate RTL along side the simulated Bochs PC • For validation and performance (CPI) eval Bochs Bochs RTL RTL CPU CPU CPU == MEM I/Os MEM I/Os Validation Performance Evaluation
Co-Simulation Testbench Bochs src code Manual coding Bluespec x86 Bluespec compilation Automated Workloads on Bochs Verilog x86 C++ conversion (Verilator) Bochs simulation Co-simulation Traces C++ x86 Validation and performance evaluation results
Outline • Introduction • Our Approach • Status and Results • Discussions and Future work
Implementation Progress • Implemented ISA subset • all real-mode general purpose instructions • 166 insts, 369 inst behaviors • compared to complete x86 • 482 insts, ~1000 inst behaviors • Synthesis • convert Bluespec to synthesizable Verilog • Xilinx ISE 7.1, Virtex II Pro 70 (FPGA on BEE2) • results: 98 MHz, 28K Slices (85% util)
Co-simulation Results • Validation • validated our model w/ FreeDOS bootup traces • tested first 140M dynamic instructions • exercised 183 inst behaviors • Performance Evaluation • also with FreeDOS bootup traces
A Complete x86? • To finish the x86 model • can be done, but takes effort • consumes a lot of FPGA resources • Do we really need all of it? • a workload uses only a subset of the ISA • some insts used more often than others parts of ISA is never or rarely used • PROTOFLEX migration • combine FPGA & simulation • model necessary subset in HW, the rest in SW
Computer Architecture Lab at Future Work • Short-term (Fall’06) • implement protected-mode support • validate/evaluate w/ more workloads • Linux, SPEC-CPU, commercial apps (DB2) • deployment on the BEE2 board • Long-term • full-system prototype execution • architectural exploration SIMFLEX/PROTOFLEX http://www.ece.cmu.edu/~simflex/