150 likes | 238 Views
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning. Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering. Richard Weiss Hampshire College School of Cognitive Science. BROWN UNIVERSITY. Motivation.
E N D
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering Richard Weiss Hampshire College School of Cognitive Science BROWN UNIVERSITY
Motivation • Performance drives high-end processor design • Include many complex architectural features • Resources may not always be optimally utilized • Resources dissipate some power regardless of utilization • Dynamic schemes allow processor to reconfigure resources according to program’s needs • Some means of monitoring program is needed to drive reconfiguration BARC January 30, 2003
Monitoring Options • Hardware monitoring • Relatively easy to implement • Can easily adjust to changing patterns • Must first recognize pattern before reacting • Restricted to fixed-sized sampling windows • Software profiling • Reconfiguration occurs in anticipation of changing needs • Sampling ranges are adaptable • Requires instruction annotation and initial sampling overhead • Only applicable to instructions with very deterministic behavior BARC January 30, 2003
Why Not Combine? • Each has its particular benefits • If hardware and software techniques can be combined, can we improve the control policies driving processor reconfiguration? • Potentially lead to better energy savings and higher overall performance. BARC January 30, 2003
Our Goal • Have HW and SW profiling work together to better identify program behavior • Allow processor to react more quickly to strongly deterministic behavior • Allow HW monitoring to assist with hard-to-predict cases with hints from software profiling BARC January 30, 2003
Low Power Configurations • We consider 2 different configurations separately: • Reducing issue width and ALUs • Save power in issue queue arbitration logic • Save power from underutilized ALUs • Fetch Halting • Triggered by a critical load missing to main memory • Fetching is disabled for the duration of the miss • Reduces occupancy rates in fetch and issue queues • Reduces number of wrong path instructions fetched BARC January 30, 2003
Load/Store Unit Load/Store Unit Load/Store Unit Pipeline Organization Integer ALU Cluster 1 Branch Predictor Low-Power State Logic Integer ALU Cluster 2 Disable Fetch Unit Disable auxiliary ALU cluster and reduce issue width Annotation Decoder Data Cache Instruction Scheduler FetchUnit Instruction Decoder Instruction Cache RegisterFile Floating Point ALU Cluster 1 Floating Point ALU Cluster 2 BARC January 30, 2003
Adjusting Issue Width • Adjust issue width between 8 and 4 and disable second integer ALU cluster • SW approach profiles IPC from train dataset • Annotates blocks with low IPC • Decoding start of block triggers entry to LP mode • HW approach using built-in counters to monitor IPC • Use fixed 256 cycle window • If integer IPC < threshold, enter LP mode • Combined approach • SW steers blocks with consistent behavior • HW handles remaining blocks BARC January 30, 2003
Results for Reduced Issue Width • SW and HW results are comparable • COMBined results show that SW + HW methods identify different opportunities for saving power BARC January 30, 2003
Results for Reduced Issue Width • SW performance is more consistent because thresholds can be tuned on a per-application basis BARC January 30, 2003
Fetch Halting • Requires a combination of SW and HW monitoring: • SW profiling: • Identify critical loads that miss to main memory • IPC, occupancy rates, dead cycles, “miss stride” • HW monitoring: • Using annotations from SW profiling, HW tracks miss behavior only for “promising” load instructions. • Miss stride from annotations is compared to miss counter in HW to capture dynamic miss behavior • For now we simulate a perfect miss-predictor BARC January 30, 2003
Fetch Halting Potential • Memory access rates shows that the fetch halting potential for each benchmark varies BARC January 30, 2003
Results for fetch halting • Restricting fetch halting based on criticality information benefits performance BARC January 30, 2003
Fetch Halting and RUU Occupancy • Perfect + crit results in average 10% RUU occupancy drop BARC January 30, 2003
Conclusions and Future Work • HW and SW predict different low power events and can be combined offering greater power saving potential. • Future work: • Improve HW/SW combination scheme • Improve criticality predictor • Currently working on HW miss predictor • Adjust the halt period BARC January 30, 2003