MRAM as High-Bandwidth, Low-Latency DRAM Replacement

On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen W. Keckler, and Doug Burger Computer Architecture and Technology Lab University of Texas at Austin

Motivation • Latency to off-chip memory hundreds of cycles • Off-chip memory bandwidth becoming a performance limiting factor • MRAM – Emerging memory technology with high bandwidth and low latency • Goal of our work - To determine if the performance advantage of MRAM in high performance computing is worth more investment and research MRAM has the potential to provide low latency, high bandwidth

Outline • MRAM Memory Description • MRAM Memory Hierarchy • Results • Conclusions

Magnetoresistive random access memory (MRAM) uses the magnetic tunnel junction (MTJ) to store information MRAM cell composed of a diode and an MTJ stack MTJ stack consists of two ferromagnetic layers separated by a thin dielectric barrier Polarization of one layer fixed, other used for information storage Diode Bit Line MTJ Stack Pt Co/Fe Ni/Fe Read/Write Current Al2O3 Co/Fe Ni/Fe Mn/Fe Pt W Word Line MRAM Cell

MRAM Bank Design • MRAM cells located at the intersection of each word and bit line • Read – Connect current sources to bit lines and selected wordline is pulled low • Writes – Polarity of current in the bit lines decides value stored • MRAM banks accessed using vias

MRAM Bank Modeling • Modified CACTI-3.0 to develop an area and timing tool to model MRAM banks • Independently accessible composed of sub-banks • Important features • Active area consumed • Delay due to vertical wires • MRAM capacity for a given die size and cell size • Support for multiple layers with sharing • SIA 2001 roadmap at 90 nm technology

Chip-Level Architecture

MRAM Design Issues • Number of Banks • More banks : Low latency, higher concurrency, higher network traversal time, higher miss rates • Cache Line Size • Larger line size : More spatial locality, higher latency • Page Placement Policy • Random • Round-robin • Least loaded

Methodology • Simulated Processor • Alpha 21264 pipeline modified for 8 wide issue • 3.8 GHz (10 FO4 inverters per stage) • Base SDRAM System • Distributed L2 cache • Base MRAM system • Distributed MRAM banks and reduced capacity distributed L2 cache • Benchmarks • Memory intensive SPEC CPU2000, Scientific, Speech

Page Placement Policy IPC for 100 banks with different page placement policies CostLeast-Loaded = (L2 Hit Rate * L2 Hit Latency) + (L2 Miss Rate * MRAM Bank Latency) + Current Network Latency to Bank

MRAM Sensitivity 20 30 40 60 MRAM Latency Sensitivity SDRAM Latency : 30 ns

Conclusions • Developed an architectural model for exploiting an emerging memory technology, MRAM • Analyzed the contribution to performance of the different components in our MRAM system • MRAM system performs 15 % than conventional SDRAM

MRAM as High-Bandwidth, Low-Latency DRAM Replacement

MRAM as High-Bandwidth, Low-Latency DRAM Replacement

Presentation Transcript

Lecture 21: Memories SRAM DRAM

Low Latency Networking

Low Latency Computations on Massive Data

High -Fidelity Latency Measurements in Low -Latency Networks

3D Systems with On-Chip DRAM for Enabling Low-Power High-Performance Computing

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

LASTor : A Low-Latency AS-Aware Tor Client

On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories

Low-Cost, High-Latency, Unlimited-Bandwidth Communication

Impact on High-Performance Applications: FPGA Chip Bandwidth at 40 nm

Middleware: High Technical Bandwidth, High Political Latency

Infinite Bandwidth, Zero Latency

Low Voltage Low Power Dram

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

Middleware: High Technical Bandwidth, High Political Latency

The Design and Implementation of a Low-Latency On-Chip Network

Low-Cost, High-Latency, Unlimited-Bandwidth Communication

A Case for High-Bandwidth Monitoring

A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect

Low Latency Server

Hong Kong Dedicated Server Hosting - Get Low Latency & High Bandwidth