240 likes | 518 Views
MemScale: Active Low-Power Modes for Main Memory. Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University *University of Michigan. Server memory power challenges. Power consumption of a Google server [Barroso & Hoelzle’07].
E N D
MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University *University of Michigan
Server memory power challenges Power consumption of a Google server [Barroso & Hoelzle’07] Power (% of peak) Compute Load (%) • DRAM power varies little with load • Memorypower represents 30-40% of total power for typical loads • Fraction is larger since memory controller power is not included
Improving memory energy efficiency Observation: Memory bandwidth is rarely fully utilized [Meisner’11]; we can save energy during periods of light and moderate load Previous approaches Leveraging DRAM idle low-power state[Lebeck’00][Delaluz’01][Li’04][Diniz’07]… Rank sub-setting and DRAM reorganization [Ahn’09][Udipi’10][Zheng’10]… Memory controller power is typically not considered Need active low-power modes to save energy when underutilized Frequency has greater impact on bandwidth than latency
MemScale: Active low-power modes for memory Goal: Dynamically scale memory frequency to conserve energy Hardware mechanism: Frequency scaling (DFS) of the channels, DIMMs, DRAM devices Voltage & frequency scaling (DVFS) ofthe memory controller Key challenge: Conserving significant energy while meeting performance constraints Approach: Online profiling to estimate performance and bandwidth demand Epoch-based modeling and control to meet performance constraints Main result: System energy savings of 18% with averageperformance loss of 4%
Outline Motivation and overview Background on memory systems MemScale: DVFS for the memory system Results Conclusions
Impact of frequency scaling on memory latency Req ACT PRE Burst Reply CL 800 MHz MC ACT CL Burst PRE Time 400 MHz MC ACT CL Burst PRE • For DDR3 DRAM, scaling frequency from 800MHz to 400MHz: bandwidth down by 50%, latency up by only 10%
Opportunity for MemScale Background: clock tree, I/O driver, register, PLL, DLL, refresh, others Dynamic: read, write, termination MC: memory controller • Effects of lower frequency on power: • Lowers background power linearly (~f) • Lowers MC power by cubic factor (~f^3)
Outline Motivation and overview Background on memory systems MemScale: DVFS for the memory system Results Conclusions
MemScale design Goal: Minimize energy under user-specified slowdown bound Approach: OS-managed, epoch-based memory frequency tuning Each epoch (e.g., an OS quantum): Profile performance & bandwidth demand New performance counters track mem latency, queue occupancies Estimate performance & energy at each frequency Models estimate queuing delays & system energy Re-lock to best frequency; continue tracking performance Slack: delta between estimated & observed performance Carry slack forward to performance target for next epoch 9
Frequency and slack management Actual Pos. Slack Profiling Neg. Slack Pos. Slack CPU Target Calculate slack vs. target Estimate performance/energy via models High Freq. MC, Bus + DRAM Low Freq. Epoch 1 Epoch 2 Epoch 3 Epoch 4 Time 10 10
Modeling of performance and energy • New performance counters enable estimate of • Level of contention (bank and bus) • Energy consumption • CPI of each application • Avg memory latency • Performance slack • Estimate full system energy
MemScale adjusts frequency dynamically Timeline of workload mix MID3
Outline Motivation and overview Background on memory systems MemScale: DVFS for the memory system Results Conclusions 13
Methodology Detailed simulation 16 cores, 16MB LLC, 4 DDR3 channels, 8 DIMMs Multi-programmed workloads from SPEC suites Power modes 10 frequencies between 200 and 800 MHz Power consumption Micron’s DRAM power model Memory system power = 40% of total server power
Results – energy savings and performance Average energy savings Performance overhead Memory energy savings of 44% System energy savings of 18% always within performance bound
Alternative approaches • Fast power-down • Transition ranks into fast power-down mode when idle • Decoupled-DIMM [Zheng’09] • Low frequency DRAM + high frequency DIMMs & channels • Static • Pre-selected active low-power mode w/o dynamic scaling • Unrealistic: needs a priori knowledge of workload behavior
Results – comparison to alternative approaches Full system energy savings (MID) Performance overhead (MID) Energy Savings (%) CPI increase (%) Static Fast-PD Static MemScale Fast-PD MemScale Decoupled-DIMM MemScale+Fast-PD Decoupled-DIMM MemScale+Fast-PD
Conclusions MemScale contributions: Active low-power modes for the memory subsystem New perf. counters to capture energy and contention OS policy to choose best power mode dynamically Avg 18% system energy savings,avg 4% performance loss In the paper Performance and energy models Sensitivity analyses (including lower performance bounds) Energy break-down comparison
THANKS! SPONSORS: