460 likes | 844 Views
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08. DRAM Background. Typical Memory. Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection . DRAM cell. DRAM array.
E N D
DRAM background • Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08
Typical Memory • Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection
protocol, timing Operations(commands)
“The purpose of a row access command is to move data from the DRAMarrays to the sense amplifiers.” • tRCD and tRAS
“ A column read command moves data from the array of sense amplifiers of a given bank to the memory controller.” • tCAS, tBurst
Precharge: separate phase that is a prerequisite for the subsequent phases of a row access operation (bitlines set to Vcc/2 or Vcc)
Logical Channels: set of physical channels connected to the same memory controller
Open x Close page • Open-page: data access to and from cells requires separate row and column commands • Favors accesses on the same row (sense aps open) • Typical general purpose computers (desktop/laptop) • Close-page: • Intense amount of requests, favors random accesses • Large multiprocessor/multicore systems
Available Parallelism in DRAM System Organization • Channel: • Pros: performance • different logical channels, independent memory controllers • schedulling strategies • cons • Number of pins, power to deliver • Smart but not adaptive firmware
Available Parallelism in DRAM System Organization • Rank • pros • accesses can proceed in parallel in different ranks (busses availability) • cons • Rank-to-rank switching penalties in high frequency • Globally synchronous DRAM (global clock)
Available Parallelism in DRAM System Organization • Bank • Different banks (busses availability) • Row • Only 1 row/bank can be active at any time period • Column • Depends on management (close-page / open-page)
Paper: Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07
parallel bus scaling: frequency, widths, length, depth (man hops => latency ) #memory controllers increased CPUs, GPUs #DIMMs/channel (depth) decreases 4DIMMs/channel in DDRs 2 DIMMs/channel in DDR2 1 DIMM/channel in DDR3 scheduling Issues
Contributions • Applied DDR based memory controller policies in FBDIMM memory • Evaluation of Performance • Exploit FBDIMM depth: rank (DIMM) parallelism • latency and bandwidth for FBDIMM and DDR • high utilization of the channels, FBDIMM • 7% in latency • 10% • low utilization of the channels • 25% in latency • 10 % in bandwidth
Northbound channel: reads / Southbound-channel: writes • AMB: pass-through switch, buffer, serial/parallel converter
Methodology • DRAMsim simulator • Execution-driven simulator • Detailed models of FBDIMM and DDR2 based on real standard configurations • Standalone / coupled with M5/SS/Sesc • Benchmarks: bandwidth-bound • SVM from Bio-Parallel (r:90%) • SPEC-mixed: 16 independent (r:w = 2:1) • UA from NAS (r:w = 3:2) • ART (SPEC-2000, OpenMP) (r:w = 2:1)
Methodology: cont • Different scheduling policies: greedy, OBF, most/last pending and RIFF • 16-way CMP, 8MB L2 • Multi-threaded traces gathered with CMP$im • SPEC traces using Simplescalar with 1MB L2, in-order core • 1 rank/DIMM
High-bandwidth utilization: • Better bandwidth: FBDIMM • Larger latency
Low utilization: serialization cost • Depth: FBDIMM scheduler offsets serialization
Overhead: queue, south and rank availability • Single-rank: higher latency
Scheduling • Best: RIFF, priority on reads than writes
Bandwidth is less sensitive th Higher latency in open-page mode • More channels => decreases channel utilization