200 likes | 572 Views
sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator. Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin. Introduction . sim-alpha: execution-driven simulator Execution-driven simulation is the most accurate simulation technique
E N D
sim-alpha: A Validated, Execution-Driven Alpha 21264 Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin
Introduction • sim-alpha: execution-driven simulator • Execution-driven simulation is the most accurate simulation technique • Detailed simulation of the memory system and the processor pipeline are done simultaneously. • It models the implementation constraints and the performance low-level features in Alpha 21264.
The sim-alpha goals • Extend the SimpleScalar tool set to model an existing microprocessor (EV6 microarchitecture) • Compare the simulator against actual hardware for accurate modeling • Release the simulator for use by researchers studying extensions to existing implementations
Code structure • Code for each pipeline stage in a separate .c file • Each .c file has corresponding .h file containing function prototypes, constants, and extern statements for global variables.
Microprocessor features • Issue width of 6 instructions (4 integer and 2 floating point) during each CPU cycle from a 20-entry integer issue queue and a 15-entry floating point issue queue. • 80-entry reorder buffer. • 4 integer units with an 80-entry register file. These units are called sub-clusters and operate on specific classes of instructions. • 2 floating-point units with a 72-entry register file. • 32-entry load queue • 32-entry store queue • Alpha 21264 tournament predictor with local, global and choice predictors
Microarchitectural features • Line predictor • Predicts I-cache line to be accessed in next cycle • Way predictor • Predicts which set is being accessed • Partitioned integer execution core • 2 clusters: each one has a copy of integer register file and 2 subclusters (lower and upper) • Static slotting • Instructions are statically assigned to the 2 subclusters (slot stage) and then the 21264 dynamically chooses the cluster during issue.
Microarchitectural features (cont.) • Load use speculation • Issuing of instructions dependent on a load assuming a load hit. If load misses instructions are squashed and re-issued • Different memory traps • Load-load trap: when newer load issues before earlier load to same address • Load-store trap: when newer load issues before earlier store to same address • stWait table • 1024 one bit table, indexed by PC, to stall issue of loads causing order traps. Processor does not issue a load for which the stWait bit is set until previous stores have issued.
Basic structures • Fetch queue • Slot queue • Mapping table (logical to physical register) • Reorder buffer • Issue queue • Load queue • Store queue • Ready queue • Event queue (events to free issue queue entry 2 cycles after issue and signal completion of execution of instruction)
Sim-alpha internals • Sim-alpha is execution-driven, so it executes instructions down the mis-speculated path in the same way an actual processor would execute them. • captures the behavior of mis-speculated instructions, but • The correct path is known only at commit time and cannot be simulated easily.
Pipeline stages • Fetch stage: • Instruction cache access • Fetch_width number of instructions per cycle (default:4) • Slot stage: • Static assignment of instructions to either upper or lower subclusters. • Control instructions access the branch predictor.
Pipeline stages (cont.) • Map stage: • Identifies the input and output registers • Checks for availability of reorder buffer entry, integer or floating point issue queue entry, physical output register and load or store queue entry (if instruction is load or store). • If input physical registers are ready, instruction is placed in ready queue. • Issue stage: • Picks instructions from ready queues, checks the availability of functional units and issues the instruction to FUs. Register read latency is charged here. Events are set up for queue entry release and instruction completion.
Pipeline stages (cont.) • Writeback stage: • Wakes up the dependent instructions when a producing instruction completes. • Load instructions access the D-cache • Mispredictions are indicated in the corresponding reorder buffer entry • Commit stage • Retires instructions from reorder buffer • Examines the head of reorder buffer for mispredictions and traps and flushes the pipeline in these cases.
Simulator Performance • The simulator has been validated compared to a hardware 21264 implementation, and has achieved a 2% error across a suite of microbenchmarks designed to stress various microarchitectural features. • The error across the 10 SPECINT 2000 benchmarks is 6.6% and the 12 SPECFP 2000 benchmarks is 21% • The greater error in floating point benchmarks is due to insufficient modeling of floating point pipeline and inaccuracies in memory system implementation.
Summary • sim-alpha provides a flexible, validated baseline for researchers to evaluate new architectural enhancements • Option to turn off constraints and change parameters such as fetch width, issue queue sizes, reorder buffer size allows the user to study their influence.