1 / 16

sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator

sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator. Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin. Introduction. sim-alpha: execution-driven simulator Execution-driven simulation is the most accurate simulation technique

rumor
Download Presentation

sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin

  2. Introduction • sim-alpha: execution-driven simulator • Execution-driven simulation is the most accurate simulation technique • Detailed simulation of the memory system and the processor pipeline are done simultaneously. • It models the implementation constraints and the performance low-level features in Alpha 21264.

  3. The sim-alpha goals • Extend the SimpleScalar tool set to model an existing microprocessor (EV6 microarchitecture) • Compare the simulator against actual hardware for accurate modeling • Release the simulator for use by researchers studying extensions to existing implementations

  4. Code overview

  5. Code structure • Code for each pipeline stage in a separate .c file • Each .c file has corresponding .h file containing function prototypes, constants, and extern statements for global variables.

  6. Microprocessor features • Issue width of 6 instructions (4 integer and 2 floating point) during each CPU cycle from a 20-entry integer issue queue and a 15-entry floating point issue queue. • 80-entry reorder buffer. • 4 integer units with an 80-entry register file. These units are called sub-clusters and operate on specific classes of instructions. • 2 floating-point units with a 72-entry register file. • 32-entry load queue • 32-entry store queue • Alpha 21264 tournament predictor with local, global and choice predictors

  7. Microarchitectural features • Line predictor • Predicts I-cache line to be accessed in next cycle • Way predictor • Predicts which set is being accessed • Partitioned integer execution core • 2 clusters: each one has a copy of integer register file and 2 subclusters (lower and upper) • Static slotting • Instructions are statically assigned to the 2 subclusters (slot stage) and then the 21264 dynamically chooses the cluster during issue.

  8. Microarchitectural features (cont.) • Load use speculation • Issuing of instructions dependent on a load assuming a load hit. If load misses  instructions are squashed and re-issued • Different memory traps • Load-load trap: when newer load issues before earlier load to same address • Load-store trap: when newer load issues before earlier store to same address • stWait table • 1024 one bit table, indexed by PC, to stall issue of loads causing order traps. Processor does not issue a load for which the stWait bit is set until previous stores have issued.

  9. Basic structures • Fetch queue • Slot queue • Mapping table (logical to physical register) • Reorder buffer • Issue queue • Load queue • Store queue • Ready queue • Event queue (events to free issue queue entry 2 cycles after issue and signal completion of execution of instruction)

  10. Sim-alpha internals • Sim-alpha is execution-driven, so it executes instructions down the mis-speculated path in the same way an actual processor would execute them. • captures the behavior of mis-speculated instructions, but • The correct path is known only at commit time and cannot be simulated easily.

  11. EV6 Pipeline

  12. Pipeline stages • Fetch stage: • Instruction cache access • Fetch_width number of instructions per cycle (default:4) • Slot stage: • Static assignment of instructions to either upper or lower subclusters. • Control instructions access the branch predictor.

  13. Pipeline stages (cont.) • Map stage: • Identifies the input and output registers • Checks for availability of reorder buffer entry, integer or floating point issue queue entry, physical output register and load or store queue entry (if instruction is load or store). • If input physical registers are ready, instruction is placed in ready queue. • Issue stage: • Picks instructions from ready queues, checks the availability of functional units and issues the instruction to FUs. Register read latency is charged here. Events are set up for queue entry release and instruction completion.

  14. Pipeline stages (cont.) • Writeback stage: • Wakes up the dependent instructions when a producing instruction completes. • Load instructions access the D-cache • Mispredictions are indicated in the corresponding reorder buffer entry • Commit stage • Retires instructions from reorder buffer • Examines the head of reorder buffer for mispredictions and traps and flushes the pipeline in these cases.

  15. Simulator Performance • The simulator has been validated compared to a hardware 21264 implementation, and has achieved a 2% error across a suite of microbenchmarks designed to stress various microarchitectural features. • The error across the 10 SPECINT 2000 benchmarks is 6.6% and the 12 SPECFP 2000 benchmarks is 21% • The greater error in floating point benchmarks is due to insufficient modeling of floating point pipeline and inaccuracies in memory system implementation.

  16. Summary • sim-alpha provides a flexible, validated baseline for researchers to evaluate new architectural enhancements • Option to turn off constraints and change parameters such as fetch width, issue queue sizes, reorder buffer size allows the user to study their influence.

More Related