DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA

DRAM background • Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM Background

Typical Memory • Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection

DRAM cell

DRAM array

DRAM device or chip

Command/data movement: DRAM chip

protocol, timing Operations(commands)‏

Examples of DRAM operations(commands)‏

“The purpose of a row access command is to move data from the DRAMarrays to the sense amplifiers.” • tRCD and tRAS

“ A column read command moves data from the array of sense amplifiers of a given bank to the memory controller.” • tCAS, tBurst

Precharge: separate phase that is a prerequisite for the subsequent phases of a row access operation (bitlines set to Vcc/2 or Vcc)‏

Organization, access, protocols

Logical Channels: set of physical channels connected to the same memory controller

Examples of Logical Channels

Rank = set of banks

Row = DRAM page

Width: aggregating DRAM chips

Scheduling: banks

Scheduling banks

Scheduling: ranks

Open x Close page • Open-page: data access to and from cells requires separate row and column commands • Favors accesses on the same row (sense aps open)‏ • Typical general purpose computers (desktop/laptop)‏ • Close-page: • Intense amount of requests, favors random accesses • Large multiprocessor/multicore systems

Available Parallelism in DRAM System Organization • Channel: • Pros: performance • different logical channels, independent memory controllers • schedulling strategies • cons • Number of pins, power to deliver • Smart but not adaptive firmware

Available Parallelism in DRAM System Organization • Rank • pros • accesses can proceed in parallel in different ranks (busses availability)‏ • cons • Rank-to-rank switching penalties in high frequency • Globally synchronous DRAM (global clock)‏

Available Parallelism in DRAM System Organization • Bank • Different banks (busses availability)‏ • Row • Only 1 row/bank can be active at any time period • Column • Depends on management (close-page / open-page)‏

Paper: Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07

parallel bus scaling: frequency, widths, length, depth (man hops => latency )‏ #memory controllers increased CPUs, GPUs #DIMMs/channel (depth) decreases 4DIMMs/channel in DDRs 2 DIMMs/channel in DDR2 1 DIMM/channel in DDR3 scheduling Issues

Contributions • Applied DDR based memory controller policies in FBDIMM memory • Evaluation of Performance • Exploit FBDIMM depth: rank (DIMM) parallelism • latency and bandwidth for FBDIMM and DDR • high utilization of the channels, FBDIMM • 7% in latency • 10% • low utilization of the channels • 25% in latency • 10 % in bandwidth

Northbound channel: reads / Southbound-channel: writes • AMB: pass-through switch, buffer, serial/parallel converter

Methodology • DRAMsim simulator • Execution-driven simulator • Detailed models of FBDIMM and DDR2 based on real standard configurations • Standalone / coupled with M5/SS/Sesc • Benchmarks: bandwidth-bound • SVM from Bio-Parallel (r:90%)‏ • SPEC-mixed: 16 independent (r:w = 2:1)‏ • UA from NAS (r:w = 3:2)‏ • ART (SPEC-2000, OpenMP) (r:w = 2:1)‏

Methodology: cont • Different scheduling policies: greedy, OBF, most/last pending and RIFF • 16-way CMP, 8MB L2 • Multi-threaded traces gathered with CMP$im • SPEC traces using Simplescalar with 1MB L2, in-order core • 1 rank/DIMM

High-bandwidth utilization: • Better bandwidth: FBDIMM • Larger latency

ART and UA: latency reduction

Low utilization: serialization cost • Depth: FBDIMM scheduler offsets serialization

Overhead: queue, south and rank availability • Single-rank: higher latency

Scheduling • Best: RIFF, priority on reads than writes

Bandwidth is less sensitive th Higher latency in open-page mode • More channels => decreases channel utilization

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA

Presentation Transcript

Modern DRAM Memory Architectures

Scalable Many-Core Memory Systems Topic 1: DRAM Basics and DRAM Scaling

Understanding Memory and Memory Devices

Scalable Many-Core Memory Systems Lecture 2, Topic 1: DRAM Basics and DRAM Scaling

Memory Systems in the Multi-Core Era Lecture 1: DRAM Basics and DRAM Scaling

Scalable Many-Core Memory Systems Lecture 1, Topic 1: DRAM Basics and DRAM Scaling

A Fully Buffered Memory System Simulator

Shared-memory Architectures

A Fully Buffered Memory System Simulator

A Performance Comparison of Contemporary DRAM Architectures

Towards Scaling Fully Personalized PageRank

Dynamic Random Access Memory (DRAM)

Lecture: DRAM Main Memory

Lecture: Virtual Memory, DRAM Main Memory

Shared memory architectures

Computer Architecture Memory: SRAM, DRAM

Fully Buffered DIMM

Lecture: DRAM Main Memory

Lecture: DRAM Main Memory

Lecture: Virtual Memory, DRAM Main Memory

Lecture: DRAM Main Memory