130 likes | 345 Views
Phase Change Memory What to wear out today?. Chris Craik, Aapo Kyrola, Yoshihisa Abe. Memory Technologies. Concerns Density Latency Energy Off Chip Technologies DRAM Moderately dense, but not very fast Flash Fairly dense, but near-disk slowness. Evaluation of Technologies.
E N D
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe
Memory Technologies • Concerns • Density • Latency • Energy • Off Chip Technologies • DRAM • Moderately dense, but not very fast • Flash • Fairly dense, but near-disk slowness
Phase Change Memory • Bit recorded in ‘Phase Change Material’ • SET to 1 by heating to crystallization point • RESET to 0 by heating to melting point • Resistance indicates state
Phase Change Memory • Density • 4x increase over DRAM • Latency • 4x increase over DRAM • Energy • No leakage • Reads are worse(2x), writes much worse (40x) • Wear out • Limited number of writes (but better than Flash) • Non-volatile • data persists in memory
Solutions to wearing & energy • Partial writes = write only bits that have changed • Caches keep track of written bytes/words per cacheline (Lee et. al) • storage overhead vs. accuracy • When writing a row to memory, first read old row and compare => write only modified bits (Zhou et al.) Most written bits redundant! Writes cause thermal expansion / contraction that wears the material and requires strong current. But contrary to DRAM, PCM does not leak energy.
Solutions to wearing & energy (cont.) • Buffer organisation (Lee et al.) • DRAM uses one row buffer (2048B) • propose using up to 32 * 64B narrow buffers, each with own association • capture coalescing writes: temporal locality more important than spatial locality • find 4*512B most effective • area-neutral • also helps decrease latency • Small DRAM buffer for PCM (Qureshi et al.) • combine low latency of DRAM with high capacity of PCM • similarly use Flash cache for Disk
Solutions to wearing & energy Spatial locality is now a problem! • Wear leveling (Zhou et al.) • row shifting: even out writes among cells in a row • needs extra hardware • segment swapping: even out between pages • implemented in memory controller
PCM as On-chip Cache • Hybrid on-chip cache architecture consisting of multiple memory technologies • PCM, SRAM, embedded DRAM (eDRAM), and Magnetic RAM (MRAM) • PCM is slow compared to SRAM etc. • But high density, non-volatility etc. help PCM • Use as complement to faster memory technologies • As “slow” L2 cache, as L3 cache etc.
Cache Structure Example • Use PCM as huge L3 cache • SRAM and eDRAM both as L2 • Faster and smaller SRAM region • Slower and larger eDRAM region L3 SRAM1MB L2 SRAM256KB Corew/ L1 L2 eDRAM (Slow: <4MB) L2 SRAM (Fast: 256KB) Core w/ L1 Same Footprint L3 PCM (32MB) • Compared to 3-level SRAM cache model: • 18% improvement in instructions per cycle • Comparable power consumption • Despite additional layer of PCM and its large capacity • Various design possibilities • PCM as “third” L2 cache etc.
Summary • PCM can be viable approach towards next-generation memory architecture • High density, non-volatility • Various techniques to overcome shortcomings • Short endurance, high-energy writes, latencies • Could be used as main memory or in on-chip cache hierarchy
Questions • How well do results obtained on benchmark apps translate to real usage? • Variance of endurance of memory cells? • may some cells wear out very quickly? • Possibilities of PCM non-volatility instant wake-up from hibernation etc.