170 likes | 287 Views
Power and Frequency Analysis for Data and Control Independence in Embedded Processors. Farzad Samie Amirali Baniasadi Sharif University of Technology University of Victoria. This Work. Goal
E N D
Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of TechnologyUniversity of Victoria
This Work Goal Power and frequency analysis for control independent and data independent instructions in embedded processors Motivation Embedded processors are becoming complex Modern embedded processors use speculation Mis-speculation causes performance and power penalty Power is a major concern in embedded processors Save power and gain performance
This Work (cont.) Our Approach • Reducing wasted energy and time in mispredictions. How? • Identify and bypass Control Independent (CI) and Data Independent (DI) instructions. • CIs: Instruction executing independent of branch outcome. • CI-DI: CI Instructions executing with the same operands. Key Result: • 12% processor energy reduction.
Background Branch Predictor Branch History Predicted direction Program Counter Predicted target address BranchPrediction
Background (cont.) I1 I2 I3 Branch Inst. Not taken Taken I4 I10 Wrong Path (squashed) ? ? I5 I11 Right Path I6 I12 Misprediction Detection I7 I7 I8 I8 Control Independent Instructions (CIs) I9 I9
Background (cont.) R1←R1+R2 R4←R1 If (R4=0) Not taken Taken R5←R4+1 R5←R4+1 R2←R4-R1 R3←0 R5←R2-R3 R5←R2-R3 R1←R1-1 R1←R1-1 R3←0 R4←R6+R4 Data Independent (CI-DI) R1←R4+R1 Data Dependent (CI-DD) R5←R5-2 Data Dependent (CI-DD) R3←R3-R4 Data Independent (CI-DI)
CI-DI vs. CI-DD CI-DD Fetch Issue Dispatch Execute Write Back CI-DI • Bypassing CI-DIs saves more energy • No need to read operands/execute again • Bypassing CI-DIs provides higher performance • Not need to waste time for reading operand/executing
Methodology • Modified SimpleScalar • Wattch for power measurement • MiBench: Embedded Benchmark Suite
Distribution Wrong Path: 12%, CI: 5%, CI-DI: 2%
CI Power Reduction in Different Units Max: branch predictor unit, Min: instruction cache
CI Power Reduction in Stages Rijndael: low misprediction low wrong path low CIs
Power Sensitivity to RUU size CI CI-DI Higher power dissipation for bigger RUU sizes
Power Sensitivity to Execution Bandwidth CI CI-DI Higher power dissipation for wider execution bandwidth
Power Sensitivity to Branch Predictor Size Little sensitivity to branch predictor size
Related Work Rotenberg et. al: studied control independence in superscalar processors, HPCA99. Collins et. al: suggested mechanism to predict re-convergent point, Micro04. Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92. Gandhi et. al: recover selected branch mis-prediction, HPCA04.
Conclusion Categorize CI to CI-DI and CI-DD Potential power saving for bypassing CI and CI-DI instructions up-to 12% High sensitivity to RUU size High sensitivity to execution bandwidth Little sensitivity to branch predictor size
Question ? Thank you