330 likes | 540 Views
FB-DIMM technology. Dezső Sima Spring 2008. (Ver. 1.0). Sima Dezső, 2008. Motivations to introduce FB-DIMMs in servers/workstations. Shortcommings of the stub-bus topology used with conventional DRAM architectures [2]. Stub-bus topology. Data lines of the memory controller
E N D
FB-DIMM technology Dezső Sima Spring 2008 (Ver. 1.0) SimaDezső, 2008
Motivations to introduce FB-DIMMs in servers/workstations Shortcommings of the stub-bus topology used with conventional DRAM architectures [2] Stub-bus topology Data lines of the memory controller are electrically connected to the data lines of every DRAM device on the bus (memory channel) Impedance discontinuities effect signal integrity [2] Memory channels may have 8 DIMMs with 8 DRAM devices/DIMM (i.e. 72 devices/channel) Heavy signal loading due to the large number of devices and impedance discontinuities on the bus limit the number of DRAM devices connected to the channel the more the higher the data rate
Figure: Scaling number of channels with memory hubs [7]. Two ranks of DRAM devices per DIMM is assumed. In the case of single rank per DIMM, while the number of DIMMs per channel may be doubled, the declining trend shown in the figure remains the same.
For higher DRAM speeds less DRAM devices can be connected per memory channel [2] Stub-bus channel capacity (device density x nr. of devices) has hit its ceiling [2] but increasing server performance doubles memory capacity demand about every two years [2]
Increasing the number of memory channels Each DDR2 memory channel requires 240 pins
FB-DIMM technology (1) Principle of operation • introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses) • introduce full buffering (registered DIMMs buffer only addresses) • CRC error checking (cyclic redundancy check)
FB-DIMM technology (2) Figure: FB-DIMM memory architecture [4]
Figure: Maximum supported FB-DIMM configuration [6] (6 channels/8 DIMMs)
FB-DIMM technology (3) Implementation details (1) • Serial transmission between the North Bridge and the DIMMs • (each bit needs a pair of wires) • Number of seral links • 14 read lanes (2 wires each) • 10 write lanes (2 wires each) • Clocked at 6 x double pumped data rate • e.g. for a DDR 667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz • Every 12 cycles (that is every two memory cycles) constitute a packet. • Read packets (frames, bursts): 168 bits (12 x 14 bits) • 144 data bits • (equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) • in two memory cycles) • 24 CRC bits. • Write packets (frames, bursts): 120 bits (12 x 10 bits) • 98 payload bits • 22 CRC bits.
FB-DIMM technology (4) Implementation details (2) • 98 payload bits. • 2 frame type bits, • 24 bits of command, • 72 bits for data and commands, according to the frame type, • e.g. 72 bits of data, 36 bits of data + one command or two commands. Commands • row select, precharge, refresh, read, write etc. • all commands include a 3-bit FB-DIMM module address to select one of 8 modules.
FB-DIMM technology (5) Implementation details (3) Read bandwidth: One FB-DIMM channel transfers in one frame (that is in 12 cycles): 128 data bits, + 16 ECC bits One frame lasts 2 memory cycles One DDR2 DIMM channel transfers in 2 memory cycles: 2 x 72 bits (2 x 64-bit data + 2 x 8-bit ECC) The read bandwidth of an FB-DIMM channel equals the bandwidth of a DDR2 channel Write bandwidth: The write bandwidth of an FB-DIMM channel is up to 0.5 x the read bandwidth. But FB-DIMMs allow simultan read and write operation
FB-DIMM technology (6) FB-DIMM data puffer (Advanced Memory Buffer, AMB) Manages the read/write operations of the module Source: PC stats FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/sPC2-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/sPC2-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s Figure: Different implementations of FB-DIMMs
Figure: Block diagram of the AMB [3] (There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)
FB-DIMM technology (7) Necessary routing to connect the north bridge to the DIMM socket b) In case of an FB-DIMM (69 pins) a) In case of a DDR2 DIMM (240 pins) A 2-layer PCB is needed (but a 3. layer is used for power lines) A 3-layer PCB is needed Figure: PCB routing [4]
FB-DIMM technology (8) Figure: Latency and bandwith figures of different DRAM technologies for a mix of SPEC applications [5]
FB-DIMM technology (9) Pros and cons of FB-DIMMs Advantage of FB-DIMMs vs DDR2 and DDR3 DIMMs • more memory channels (up to 6) higher total bandwidth • more DIMM modules (up to 8) per channel higher memory capacity (up to 192 GB) • less wires simplified PCB routing • symultaneous read/write operation in a channel Disadvantage of FB-DIMMs vs DDR2 and DDR3 DIMMs • higher latency and lower bandwidth figures for 4 to 8 DIMM modules • higher cost • higher dissipation (Typical dissipation figures: DDR2: about 5 W AMB: about 5 W DDR2 FB-DIMM: about 10 W)
Latency The other issue is potentially more troubling. Intel addressed this by not having the signals be stored and then retransmitted. The data travels along a special fast-pass-through channel in the buffer itself. This lessens much of the latency that would be induced by store and forward architectures.
FB-DIMM technology (10) Market penetration of the FB-DIMM technology • 5/2006 Intel adopts it in its Bensley platform (5000) for DPs • 8/2007 Sun introduces it in the Niagara II • 9/2006 AMD has taken it off from their road map • 9/2007 Intel uses it in the Caneland platform (7000) for MPs • 2007 Major memory manufacturers intend to develop DDR3 DIMMs • instead of DDR3 based FB-DIMMs Standardisation 3/2007 JESD205 DDR2 SDRAM Fully Buffered DIMM (FBDIMM) Design Specification DDR2-533, DDR2-667, DDR2-800 x72 ECC, 240 pin 256 Mb, 512 Mb, 1 Gb, 2 Gb, 4 Gb devices 1/2007 JESD 206 FBDiMM Architecture and Protocol
FB-DIMM technology (11) DDR2 vs (SDRAM) DDR The key difference between DDR and DDR2 is that the DDR2 databus is clocked at twice the speed of the memory cells, so four data words can be transferred ineach memory cell cycle without speeding up the memory cells themselves. Figure: Clocking schemes of the SDR, DDR and DDR2 SDRAM techologies [1]
DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. The DDR2 prefetch buffer is 4 bits deep, whereas it is 2 bits deep for DDR (and 8 bits deep for DDR3). While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, early DDR2 may have read latencies between 4 and 6 cycles. Although introduced in Q2 2003 at 200/266 MHz, initially DDR2 could not be competitive due to too high latency figures. As lower latency parts became available by the end of 2004 DDR2 became widespread. Table: Burst timing, latency and bandwidth figures of DDR and DDR2 DRAM technologies [1]
CAS latency (Column Address Select),(CL) the time delay (in number of clock cycles) between a memory chip is accessed for data and the first data bit becomes available For instance, after accessing a 400 MHz CL3 device, the first bit arrives in 3 x 2.5 ns = 7.5 ns Early DDR2-533 SDRAM modules available at the time of the announcement of i925 and i915 chipsets (6/2004) had 4-4-4 timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time).
FB-DIMM technology () Power savings are achieved primarily due to a drop in operating voltage (1.8 V compared to DDR's 2.5 V). DDR2 has 240 pins instead of 168 pins used by DDR DIMMs
DDR3 Source: Anandtech Appeared mid 2007 e.g. in Intel’s P35 Bearlake Source: Wiki
5.2. Speed gap between processor and memory (1a) Figure 5.1a: DRAM types
5.2. Speed gap between processor and memory (1b) Figure 5.1b: Latency of DRAM chips
5.2. Speed gap between processor and memory (1c) Figure 5.1c: System-level memory latency in x86-based PCs
5.2. Speed gap between processor and memory (1d) Figure 5.1d: Latency of DRAM chips (in clock cycles)
5.2. Speed gap between processor and memory (2) Figure 5.2: Relative transfer rate of memories (D: dual channel)
References [1]: Gavrichenkov I., „DDR2 vs. DDR: Revenge Gained,” Xbit Laboratories, 12/17/2004, http://www.xbitlabs.com/articles/memory/display/ddr2-ddr.html [2]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf [3]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf [4]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7 [5]: Ganesh B., Jaleel A., Wang D. , Jacob B., „Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling”, Proc. HPCA 2007 [6]: - „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1 [7]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, Technology Intel Magazin, http://www.intel.com/ technology/magazine/computing/fully-buffered-dimm-0305.htm