Phase Change Memory What to wear out today?

Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe

Memory Technologies • Concerns • Density • Latency • Energy • Off Chip Technologies • DRAM • Moderately dense, but not very fast • Flash • Fairly dense, but near-disk slowness

Evaluation of Technologies

Phase Change Memory • Bit recorded in ‘Phase Change Material’ • SET to 1 by heating to crystallization point • RESET to 0 by heating to melting point • Resistance indicates state

Phase Change Memory • Density • 4x increase over DRAM • Latency • 4x increase over DRAM • Energy • No leakage • Reads are worse(2x), writes much worse (40x) • Wear out • Limited number of writes (but better than Flash) • Non-volatile • data persists in memory

Evaluation of Technologies

Solutions to wearing & energy • Partial writes = write only bits that have changed • Caches keep track of written bytes/words per cacheline (Lee et. al) • storage overhead vs. accuracy • When writing a row to memory, first read old row and compare => write only modified bits (Zhou et al.) Most written bits redundant! Writes cause thermal expansion / contraction that wears the material and requires strong current. But contrary to DRAM, PCM does not leak energy.

Solutions to wearing & energy (cont.) • Buffer organisation (Lee et al.) • DRAM uses one row buffer (2048B) • propose using up to 32 * 64B narrow buffers, each with own association • capture coalescing writes: temporal locality more important than spatial locality • find 4*512B most effective • area-neutral • also helps decrease latency • Small DRAM buffer for PCM (Qureshi et al.) • combine low latency of DRAM with high capacity of PCM • similarly use Flash cache for Disk

Solutions to wearing & energy Spatial locality is now a problem! • Wear leveling (Zhou et al.) • row shifting: even out writes among cells in a row • needs extra hardware • segment swapping: even out between pages • implemented in memory controller

PCM as On-chip Cache • Hybrid on-chip cache architecture consisting of multiple memory technologies • PCM, SRAM, embedded DRAM (eDRAM), and Magnetic RAM (MRAM) • PCM is slow compared to SRAM etc. • But high density, non-volatility etc. help PCM • Use as complement to faster memory technologies • As “slow” L2 cache, as L3 cache etc.

Cache Structure Example • Use PCM as huge L3 cache • SRAM and eDRAM both as L2 • Faster and smaller SRAM region • Slower and larger eDRAM region L3 SRAM1MB L2 SRAM256KB Corew/ L1 L2 eDRAM (Slow: <4MB) L2 SRAM (Fast: 256KB) Core w/ L1 Same Footprint L3 PCM (32MB) • Compared to 3-level SRAM cache model: • 18% improvement in instructions per cycle • Comparable power consumption • Despite additional layer of PCM and its large capacity • Various design possibilities • PCM as “third” L2 cache etc.

Summary • PCM can be viable approach towards next-generation memory architecture • High density, non-volatility • Various techniques to overcome shortcomings • Short endurance, high-energy writes, latencies • Could be used as main memory or in on-chip cache hierarchy

Questions • How well do results obtained on benchmark apps translate to real usage? • Variance of endurance of memory cells? • may some cells wear out very quickly? • Possibilities of PCM non-volatility instant wake-up from hibernation etc.

Phase Change Memory What to wear out today?