160 likes | 172 Views
This lecture discusses the design and modeling challenges for next-generation microprocessors with a focus on power-aware microarchitecture. Topics include power/energy basics, processor breakdowns, power/performance trade-offs, and power-saving strategies.
E N D
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec 2000
Power/Energy Basics • Energy = Power x time • Dynamic Power = a C V2 f • a switching activity factor • C capacitances being charged • V voltage swing • f processor frequency • Current trends: f and C are rising, V is dropping, • overall dynamic power is increasing • Leakage energy is also increasing
Processor Breakdowns Alpha 21264 Caches 16% O-o-o Issue Logic 19% Mem management unit 9% FP unit 11% Integer unit 11% Clock power 34% Pentium Pro
Metrics • Performance af a 1/D (D is delay or execution time) • Delay of a circuit a 1/(V – Vt) ; lower frequency • tolerates longer delays, hence, can reduce voltage • Power = a C V2f ; since f is roughly proportional • to voltage, P a V3 af3 • Since V and f are variable, remove it from the • expression: PD3 = constant (regardless of V and f) • This is the best metric to compare processors; • any other metric (say, perf/watt) can be “fudged” • by changing voltage or frequency
Metric Example Proc-A Proc-B V = 1.25; f = 1GHz Perf 1000 MIPS 800 MIPS Power 100W 80W V = 1.0; f = 0.8GHz Perf 800 MIPS 640 MIPS Power 51.2W 41W V= 1.5; f = 1.2GHz Perf 1200 MIPS 960 MIPS Power 172.8W 138.2W Power/f3 = 100 80 MIPS/W = 10 MIPS/W = 10 MIPS/W = 15.6 MIPS/W = 15.6 MIPS/W = 6.9 MIPS/W = 6.9
Metrics • PD3 gives ratio of power if two processors were • tuned* to yield the same performance • (PD3)1/3 gives ratio of performance if two • processors were tuned* to yield the same power • *Tuning is done through voltage and frequency • scaling and it is assumed that a linear relationship • exists between V and f – note that in modern • processors, this is not true and PDx is the right • metric, where x > 3 (x can be 1 or 2 in markets • where performance is not very critical)
Global Power Saving Strategies • Dynamic frequency scaling – trivially reduces • power, worsens performance, no effect on energy • If off-chip components (memory) dominate, there • will be an energy reduction with DFS • Leakage power is unaffected by DFS, so if leakage • dominates, overall energy increases • Montecito: 20MHz changes in frequency can • happen in a single cycle
Global Power Saving Strategies • Dynamic voltage scaling – since we are changing • frequency, can also combine it with voltage scaling • as each circuit has longer slack – has a more than • quadratic effect on dynamic power, a linear effect • on leakage power, and a more than linear effect • on energy • Intel Xscale: roughly 50ms to scale from 1.65-0.75V • DVS opportunities are reducing: lower voltage • margins, error rates may increase
Localized Power Saving Strategies • When a processor structure is not used in a cycle, • gate off its clock for that cycle – gating can happen • in a single cycle; increase in complexity • Leakage energy can be reduced by gating off • supply voltage V during periods of inactivity – takes • more time to effect • Body biasing can also reduce leakage power
Localized Power Saving Strategies Dynamically adjust frequency/voltage and size for each domain, based on thruput rates
Leakage Power Leakage is a linear function of supply voltage, a linear function of the number of transistors, and an exponential function of threshold voltage From Butts and Sohi, MICRO’00
Power-Performance Trade-Offs Caches, bpreds are doubled at each point below, while the x-axis represents the sizes of issue queues, registers, ROB, etc. Argues against going to wider/larger superscalars
Other Observations • Clustered architectures have better power scalability • (since the complexity of each cluster remains unchanged) • CMP and SMT • can employ • complexity-effective • designs – power • consumption is low • (little wasted work) • and multi-threaded • performance • continues to be high From ISPASS’06
Title • Bullet