120 likes | 146 Views
On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories. Rajagopalan Desikan, Charles R. Lefurgy, Stephen W. Keckler, and Doug Burger Computer Architecture and Technology Lab University of Texas at Austin. Motivation.
E N D
On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen W. Keckler, and Doug Burger Computer Architecture and Technology Lab University of Texas at Austin
Motivation • Latency to off-chip memory hundreds of cycles • Off-chip memory bandwidth becoming a performance limiting factor • MRAM – Emerging memory technology with high bandwidth and low latency • Goal of our work - To determine if the performance advantage of MRAM in high performance computing is worth more investment and research MRAM has the potential to provide low latency, high bandwidth
Outline • MRAM Memory Description • MRAM Memory Hierarchy • Results • Conclusions
Magnetoresistive random access memory (MRAM) uses the magnetic tunnel junction (MTJ) to store information MRAM cell composed of a diode and an MTJ stack MTJ stack consists of two ferromagnetic layers separated by a thin dielectric barrier Polarization of one layer fixed, other used for information storage Diode Bit Line MTJ Stack Pt Co/Fe Ni/Fe Read/Write Current Al2O3 Co/Fe Ni/Fe Mn/Fe Pt W Word Line MRAM Cell
MRAM Bank Design • MRAM cells located at the intersection of each word and bit line • Read – Connect current sources to bit lines and selected wordline is pulled low • Writes – Polarity of current in the bit lines decides value stored • MRAM banks accessed using vias
MRAM Bank Modeling • Modified CACTI-3.0 to develop an area and timing tool to model MRAM banks • Independently accessible composed of sub-banks • Important features • Active area consumed • Delay due to vertical wires • MRAM capacity for a given die size and cell size • Support for multiple layers with sharing • SIA 2001 roadmap at 90 nm technology
MRAM Design Issues • Number of Banks • More banks : Low latency, higher concurrency, higher network traversal time, higher miss rates • Cache Line Size • Larger line size : More spatial locality, higher latency • Page Placement Policy • Random • Round-robin • Least loaded
Methodology • Simulated Processor • Alpha 21264 pipeline modified for 8 wide issue • 3.8 GHz (10 FO4 inverters per stage) • Base SDRAM System • Distributed L2 cache • Base MRAM system • Distributed MRAM banks and reduced capacity distributed L2 cache • Benchmarks • Memory intensive SPEC CPU2000, Scientific, Speech
Page Placement Policy IPC for 100 banks with different page placement policies CostLeast-Loaded = (L2 Hit Rate * L2 Hit Latency) + (L2 Miss Rate * MRAM Bank Latency) + Current Network Latency to Bank
MRAM Sensitivity 20 30 40 60 MRAM Latency Sensitivity SDRAM Latency : 30 ns
Conclusions • Developed an architectural model for exploiting an emerging memory technology, MRAM • Analyzed the contribution to performance of the different components in our MRAM system • MRAM system performs 15 % than conventional SDRAM