130 likes | 324 Views
A Review of Processor Design Flow. How to design a CPU ?. Instruction-set architecture (ISA) design Function-level (RTL) design Component-level design Gate-level/switch-level design Circuit-level design. Design Method. Gate Level/circuit level: toward full CAD
E N D
A Review of Processor Design Flow \course\cpeg323-08F\Topics1b.ppt
How to design a CPU ? • Instruction-set architecture (ISA) design • Function-level (RTL) design • Component-level design • Gate-level/switch-level design • Circuit-level design \course\cpeg323-08F\Topics1b.ppt
Design Method Gate Level/circuit level: toward full CAD Register Level: CAD + heuristics/intuition ISA Level: mainly heuristic process with simulation validation \course\cpeg323-08F\Topics1b.ppt
ISA Simulator System Level Simulator Arch./Compiler Design Toolset Processor Architecture Design Flow Diagram Instruction Set Architecture Design (Microarchitecture Design-I) System-Level Design Compiler Design Hardware Design RTL Level Design (Microarchitecture Design II) Code Optimizer Switch Level Design Code Generator Switch Level Simulator RTL Level Simulator Circuit Level Simulator Circuit Level design HDL (VHDL or Verilog) \course\cpeg323-08F\Topics1b.ppt
Design Levels of Abstraction Abstract M I C R O A R C H C P U eax ebx moveax, [edi] cmpeax, 4 jne label10 ecx Architecture edx Logic RenIfsSetWb2H := vOR3(RenCoverUpdtIFMWb2H, vAND2(RenCrab_Data_Hi_Cx5B[31], RenCrabIfsWrEnCx5H), vAND2(RenIfsValidWb3H, vNOT(RenCrabIfsWrEnCx5H))) LAYOUT I-Cache D-Cache Switch Branch Unit C I R C U I T Instruction Decode Register Mapping Concrete FP Regs IntRegs ALU Address Calculation FPU \course\cpeg323-08F\Topics1b.ppt
Design Levels and Component Types \course\cpeg323-08F\Topics1b.ppt
Classical ISA Level Design Method • Select a prototype structure A • Modify A to accommodate: • new performance demand and new technology • Evaluation (ISA simulation) • Repeating until satisfaction \course\cpeg323-08F\Topics1b.ppt
Overall Simulation Strategy • Instruction level simulator: this is used for performance evaluation at the instruction set level as well as for more detailed modeling, e.g. the pipeline and memory system. This level is also used to generate test vectors employed in lower-level simulators. • System level simulation: this simulator models the details of the system environment including such things as interrupts and memory management. (Virtual machine level ..) \course\cpeg323-08F\Topics1b.ppt
Overall Simulation Strategy (Con’d) 3. RTL level: this simulator models are RTL description of the design • Switch level with delays: used to simulate the design mostly in components; test vectors are generated from the RTL level. 5. Circuit simulation: it is used for detailed modeling of the critical paths as well as for verification of circuits under variations in temperature, power supply, etc. \course\cpeg323-08F\Topics1b.ppt
Performance of Simulators # of cycles simulated per second on a host machine \course\cpeg323-08F\Topics1b.ppt
Instruction Set Architecture Simulation Runtime statistics (frequencies, cycle counts, etc.) Object file Execution -driven simulator Profile information Traces (e.g. memory accesses branch trace, etc.) Architecture Models Trace-driven simulator (cache simulator branch prediction simulator, etc.) Statistics (e.g. cache behavior, branch behavior, etc.) \course\cpeg323-08F\Topics1b.ppt
Performance Study by Simulation • Develop performance model that is: • Flexible • Parameterized (via knobs) • 95% clock accurate compared to RTL • Significantly smaller than RTL • Models consist of two parts: • Instruction-set simulator -> executes benchmark • Pipeline simulator -> “accountant” for clock cycles • Run benchmarks, update microarchitecture accordingly • Cycle of: code -> simulate -> characterize -> tune \course\cpeg323-08F\Topics1b.ppt
Revisit: How to design a CPU ? • Instruction-set architecture (ISA) design • Function-level (RTL) design • Component-level design • Gate-level/switch-level design • Circuit-level design Monty Denneau: I work on everything down to and including 4. Cyclops skips (2) and goes directly to 3/4. A lot of time was spent restructuring the design to make 4 meet timing. I probably spent thousands of hours on 4. We have no 5 - ASICS provides a library of gates, latches, and memory, etc. August 28, 2007 \course\cpeg323-08F\Topics1b.ppt