410 likes | 555 Views
Power Management for Chip-level Multiprocessing Processors. Kai Ma. Background. To get better performance 1. Scale frequency (fast) 2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading. SMT vs CMP. Other justification for CMP.
E N D
Power Management for Chip-level Multiprocessing Processors Kai Ma
Background • To get better performance 1. Scale frequency (fast) 2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading
Other justification for CMP • Memory wall, ILP wall, Power wall • Higher cache coherency circuitry rate • Signal integrity • Future: Many cores (many specialized cores )
Power management for CMP • Reduce operating costs for energy and cooling • Prolong battery life for portable and embedded systems • Reduce cooling requirement • Meet scalable performance target • Heat dissipation and hotspot
Outline 1. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Canturk Isci*, Alper Buyuktosunoglu*, Chen-Yong Cher*, Pradip Bose* and Margaret Martonosi *IBM T.J. Watson Research Center Department of Electrical Engineering Yorktown Heights Princeton University 2. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign
Outline • An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget 1. Contribution 2. Global Power Management 3. Global Power Management Policies: core modes, power and performance matrix 4. Experimental Result and Evaluation 5. Conclusion 6. Critique
Contribution • Introduce a global power management • Develop a static power management analysis tool • Evaluate different policies for CMP power management
Global Power Management • Monitor the power and set working mode of each core
Global Power Management Policies • Priority: Slow down the core runs low priority task • PullhiPushLo: Speedup the low power core and slow down the high power core. • MaxBIPS: Predict and choose power mode combination
Core Power Modes • Underlying mechanism: DVFS • Overhead: Order of microseconds • Performance Degradation: Elapsed execution time for benchmark
Experimental Methodology • SPEC CPU2000 benchmark • A trace-based CMP analysis tool is incorporated with IBM’s Turandot simulator • Mode switch (500ns) and Statistics collection (50ns) • During mode switch, no instruction execution, power is consumed
Trends under CMP Scaling • The difference between MaxBIPS and oracle decreases with core number increasing • Increasing core numbers has smaller impact on MaxBIPS • CMP scales favor static per-core management over chip-wide DVFS
Conclusion • Global management is preferred • Dynamic management is preferred • MaxBIPS is efficient
Critique • MaxBIPS: Prediction is superlinearly dependent on the number of modes and core • Power performance estimation matrix: transition penalty • Not consider temperature
Outline Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors 1. Background 2. Contribution 3. Algorithm 4. System Implementation 5. Evaluation 6. Conclusion 7. Critique
Background For CMP, with-in die process variation impacts: • Static power consumption • Maximum frequency
Contribution • Propose variation-aware algorithms for application scheduling • Complement these algorithms with variation-aware DVFS
CMP Configuration • High level frequency and DVFS policy
Linear Programming • A technique for optimization of a linear objective function, subject to linear equality and linear inequality constraints • c and b are known vectors, A is a known matrix, x represents variables vector
Power Mode Selection: LinOpt • TP : average throughput • N: core number • i : from 1 to N • a(i) : constant depends on the thread and core • v(i): core voltage • b(i) and c(i): constants introduced to approximate power-voltage relation • Object function: • Constraints:
Power Mode Selection: SAnn • Use annealing algorithm to solve the power mode selection problem • SAnn searches all possible combination of core voltage • Compare to LinOpt: More accurate but more costly
System Implementation • Algorithm runs on a core or a power management unit • At OS scheduling interval, OS assigns threads to cores by using VarF&AppIPC • Every 10ms, the LinOpt algorithm runs and sets the cores to correct power
Evaluation Methodology • Variation:Varius model • Power: SESC + Wattch+HotLeakage • Temperature: HotSpot • Critical Path Model: 1.Calculation path delay: Multiplier like unit 2.Memory: SRAM 3.Interconnection: Cacti 4.Gate delay: Alpha-power law
Workload • SPEC • Run different applications on different cores • 12 billion instructions
Metrics • Total power • Average frequency of active cores • Throughput • Energy delay-square product (consider Time-to-solution and energy consumption) • Weighted throughput: application’s IPC normalized to the application’s IPC at reference conditions
Evaluation • Power and frequency variation on one die
Uniform Frequency & No DVFS • As the thread number increases, there is no less used core for thread mapping
NoUniform Frequency & No DVFS • Different cores run at different frequencies, by selecting less used core, they may end up with lower frequency ones.
NoUniFreq+DVFS • Throughput: VarF&AppIPC+LinOpt is effective • Power: throughput gains are high when power targets are low
LinOpt Granularity • Deviation between power consumed and power target decreases as interval between LinOpt run increases
Conclusion • With-in die variation substantially impacts static power consumed and maximum frequency • Variation-aware algorithms are proposed and analyzed, LinOpt is efficient
Critique • How to decouple thread mapping and power mode selection • Static power consumption and dynamic power consumption should be discussed separately • Thread mapping takes place once, thread migration should be considered