10 likes | 127 Views
Madhavi Valluri and Lizy John Laboratory for Computer Architecture University of Texas at Austin www.ece.utexas.edu/projects/ece/lca. Where does power go?. Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors. Rest. Rest. Fetch. Caches. IALU. dcache.
E N D
Madhavi Valluri and Lizy John Laboratory for Computer Architecture University of Texas at Austin www.ece.utexas.edu/projects/ece/lca Where does power go? Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Rest Rest Fetch Caches IALU dcache OoO issue FPU OoO issue FPU • CASE 1 : Assume R15 & r24 depend • on input data; Potential address aliasing; • cannot be resolved statically • CASE 2: R15 and R24 are starting addresses of • two arrays a & b; no aliasing problem exists Alpha 21464 IALU MMU Fetch Compiler Schedule (case 1) Cycle 1 Instr (1) Cycle 2 Instr (2) Cycle 3 Instr (3) SIX CYCLES Cycle 4 Instr (4) PER ITERATION Cycle 5 nop Cycle 6 Instr (5) Exec Mem Motivating Example OoO issue Compiler Schedule (case 2) Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION Out-of-order issue logic* used to identify parallel instructions to issue each cycle consumes ~25-50% of processor power OoO Issue Logic Main Components: Issue Queue & Reorder Buffer - Accessed multiple times each cycle - Highly Associative - Multiple locations accessed - Large number of ports - Complex Select/Wakeup logic Loop: { (1) add R1, R2, R5; (2) sub R5, R2, R1; (3) st R1, 0(R15); (4) ld R9, 8(R24); (5) add R4, R9, R10; } NO difference between dynamic schedule and compiler schedule! Dynamic Schedule Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION • Dynamic scheduling hardware justified inirregular • regions of the program • In regularregions of programs (regions with • well-structured control-flow, regular memory access • patterns etc), hardware scheduling is redundant • SOLUTION: Hybrid-Scheduling Approach Assumed operation latencies ALU – 1 cycle LD/ST – 2 cycles Dynamic schedule is 2X better than compiler schedule a.k.a: Dynamic Issue Logic The Hybrid-Scheduling Approach Preliminary Results (ISLPED 2003) • Programs divided into low power static regions and high power dynamic region • Code in S-Regions scheduled by compiler alone into packets of independent instructions • Instruction packets issued to functional units without any dynamic dependence checks • 2 modes of execution – Superscalar mode for dynamic regions + Static mode for S-Regions • Large savings in static mode since OoO logic bypassed • Resource requirements of program varies with program phase • Dynamic resource adaptation schemes typically lower processor resources when program IPC is low and increase resources when IPC is high • Hybrid-Scheduling scheme eliminates power-hungry resource use for high IPC (or high ILP) regions Comparing with Dynamic Resource Adaptation Schemes • Benchmarks & S-Region Selection • Media & SpecFP benchmarks • Potential S-Regions - Loops • without function calls • Final S-Regions selected by • profiling • 30-99% time spent in S-Regions Future Research Directions Program • The real challenge: Adapting • Hybrid-scheduling for irregular, • SPECint-like applications • Minimal time spent in loops • without function calls (previously • explored type of S-Region) • Ongoing research – Investigating • alternate S-Regions – Hyperblocks, • Superblocks and Traces Rename O-o-O Issue Reorder Buffer alu Dynamic Region alu Decode alu Special instruction to switch mode Low Power Reorder Buffer alu • Experimental Results • 30% average energy savings • 3.6% average performance drop Static Region