210 likes | 401 Views
MICRO-43. Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory. Jeffrey Stuecheli 1,2 , Dimitris Kaseridis 1 , Hillery C. Hunter 3 & Lizy K. John 1 1 ECE Department, The University of Texas at Austin 2 IBM Corp., Austin 3 IBM Thomas J. Watson Research Center.
E N D
MICRO-43 Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory JeffreyStuecheli1,2, Dimitris Kaseridis1, Hillery C. Hunter3 & Lizy K. John1 1ECE Department, The University of Texas at Austin 2IBM Corp., Austin 3IBM Thomas J. Watson Research Center Laboratory for Computer Architecture
Overview/Summary • Refresh overhead is increasing with device density • Due to the nature of this increase, performance is suffering • Current refresh scheduling methods ineffective in hiding these delays • We propose more sophisticated mitigation methods • Elastic Refresh Scheduling 2 Laboratory for Computer Architecture 12/7/2010
Background Basic DRAM/Refresh Info • Each bit stored on a capacitor • Single read transistor to hold charge • Leakage, looses charge over time • Refresh: Rewrite cell on periodic basis • DDR3 • Temperature dependence on refresh requirement, 64ms@85oC, 32ms@95oC • DRAM device contains internal address counter • JEDEC simply specifies the time interval (tREFI, time REFresh Interval) tREFI = 64ms/8096 = 7.8 us (3.9 us for 95oC) 3 Laboratory for Computer Architecture 12/7/2010
Background Transition to denser devices • 7.8 us based on 8k Rows per bank • DRAM device density doubles ~2 year • With one refresh per row, tREFI would half each generation • Instead, multiple rows are refreshed with each command • Current delivery constraints forces increase in tRFC with denser devices 95 nm 512 MBit 42 nm 2GBit 4 Laboratory for Computer Architecture 12/7/2010
Background “Stacked” Refresh Operations in a Single Command Example Source:TN-47-16 Designing for High-Density DDR2 MemoryIntroduction by MICRON 5 Laboratory for Computer Architecture 12/7/2010
In the most basic terms, tRFC should scale linearly with density Based strictly on current to charge capacitance ~Fixed charge per bit This has been reflected in the DDR3 spec, with the exception of 8 GBit Net, even if DRAM vendors can slow the growth, the delay is large today Background tRFC Growth with DRAM Density 6 Laboratory for Computer Architecture 12/7/2010
Motivation Slowdown Effects Observed in Simulation • Simics/Gems • 4 cores, 2 1333MHz channels, 2 DDR3 Ranks/channel 7 Laboratory for Computer Architecture 12/7/2010
Motivation Why it is so bad 8 Laboratory for Computer Architecture 12/7/2010
Motivation Postponing Refresh Operations • Each cell needs to be refreshed every 64 ms, • Refresh command spacing is based around an average rate. • As such, cell failure will not occur if no refresh is sent as tREFI expires. • Current DDR3 spec allows the controller to fall eight tREFI intervals behind (backlog count) • Cell refresh rate is elongated by 0.1% (8 in 8k) 9 Laboratory for Computer Architecture 12/7/2010
Motivation Current Approaches • Demand Refresh (DR) • Most basic policy, sends refresh operations as high priority operations every tREFI period • Delay Until Empty (DUE) • Policy utilizes DRAM ability to postpone refreshes. • Refresh operations are postponed until no reads are queued, or the max backlog count has been reached • Why These policies are ineffective • DR: Does nothing to hide refreshes • DUE: Too aggressive in sending refresh operations. Does not take advantage of the backlog in many cases. 10 Laboratory for Computer Architecture 12/7/2010
Elastic Refresh • Exploit • Non-uniform request distribution • Refresh overhead just has to fit in free cycles • Initially not aggressive, converges with DUE as refresh backlog grows • Latency sensitive workloads are often lower bandwidth • Decrease the probability of reads conflicting with refreshes 11 Laboratory for Computer Architecture 12/7/2010
Introduce refresh backlog dependent idle threshold With a log backlog, there is no reason to send refresh command With a bursty request stream, the probability of a future request decreases with time As backlog grows, decrease this delay threshold Elastic Refresh Idle Delay Function Idle Delay Threshold 12 Laboratory for Computer Architecture 12/7/2010
The optimal shape of the IDF is workload dependent IDF can be controlled with the listed parameters Our system contains hardware to determine “good” parameters Max Delay and Proportional Slope Elastic Refresh Tuning the Idle Delay Function 13 Laboratory for Computer Architecture 12/7/2010
Circuit used to collect average Rank idle period Conceptually, given a exponential type distribution, the average can be used to find the tail Calculated average is used as Max Delay Circuit function, Accumulate idle delay over 1024 events Average calculated with concatenation of accumulator Elastic Refresh Max Delay Circuit 14 Laboratory for Computer Architecture 12/7/2010
Conceptually, proportional region acts to gracefully transition to high priority, while utilizing full postponed range Circuit works to balance the utilization across the postponed range (High/Low counts) PI type controller adjusts slot to balance High/Low counts Elastic Refresh Proportional Slope Circuit 15 Laboratory for Computer Architecture 12/7/2010
Elastic Refresh Hardware Cost • Trivial integration into DUE based policies • Structure replaces “empty” indication of DUE • Logic size • ~100 latch bits for static policy • ~80 additional latch bits for dynamic policy • Logic cycle time • Low frequency compared to ALU functions in processor core. • Infrequent updates could enable pipelined control. 16 Laboratory for Computer Architecture 12/7/2010
Simulation Methodology • Simics extended with GEMS model • 1, 4 & 8 cores CMP • First-Ready, First-Come-First-Served memory controller policy • DDR3 1333MHz 8-8-8 memory, 2 MC, 2 Ranks/MC • tRFC= 550ns, tREFI = 3.9μs @95oC (estimation of 16GBit) • Refresh policies: • Demand Refresh (DR) • Defer Until Empty (DUE) • Elastic Refresh policies • SPEC cpu2006 workloads 17 Laboratory for Computer Architecture 12/7/2010
Results Integer 8 Cores 18 Laboratory for Computer Architecture 12/7/2010
Related Work • B. Bhat and F. Mueller,“Making DRAM refresh predictable,” Real-Time Systems, Euromicro Conference 2010 • M. Ghosh and H. S. Lee, “Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs,” in MICRO 40 • K. Toshiaki, P. Paul, H. David, K. Hoki, J. Golz, F. Gregory, R. Raj, G. John, R. Norman, C. Alberto, W. Matt, and I. Subramanian, “An 800 MHz embedded DRAM with a concurrent refresh mode,” in IEEE ISSCC Digest of Technical Papers, Feb. 2004 Laboratory for Computer Architecture
Conclusions • The significant degradation of refresh can be mitigated with low overhead mechanisms • Commodity DRAM is cost driven • Elastic refresh requires no DRAM changes • Future work: • Coordinate refresh with other structures on the CMP • Investigate refresh for future DRAM devices (DDR4) • Example, dynamically select how many rows to refreshed 20 Laboratory for Computer Architecture 12/7/2010
Thank You,Questions?Laboratory for Computer ArchitectureUniversity of Texas AustinIBM AustinIBM T. J. Watson Lab Laboratory for Computer Architecture