1.46k likes | 1.57k Views
2. Challenges/limiters of parallel connected synchronous memories. Dezső Sima September 2008. (Ver. 1.0). Sima Dezső, 2008. Overview. 1. Key challenges facing main memories. 2. Main limiters of increasing the transfer rate of main memories - Overview.
E N D
2. Challenges/limiters of parallel connected synchronous memories Dezső Sima September 2008 (Ver. 1.0) SimaDezső, 2008
Overview 1. Key challenges facing main memories 2. Main limiters of increasing the transfer rate of main memories - Overview 3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array 4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts 5. Challenges in increasing the rate of capturing data in the memory controller/DRAM parts 6. Main limiters of increasing the memory size 7. References
1. Key challenges facing main memories (1) Key challenges facing main memories • Increasing (single core) processor performance (the past)
1. Key challenges facing main memories (2) SPECint92 Levelling off 10000 Prescott (2M) * P4/3200 * * Prescott (1M) * * 5000 P4/3060 Northwood B * * * P4/2400 * P4/2800 * P4/2000 * P4/2200 * P4/1500 * 2000 P4/1700 PIII/600 PIII/1000 1000 * * * PII/400 PIII/500 * PII/450 PII/300 * * 500 Pentium Pro/200 * ~ 100*/10 years Pentium/200 200 * * Pentium/166 * Pentium/133 Pentium/120 * Pentium/100 * 100 Pentium/66 * 486-DX4/100 * 50 486/50 * 486-DX2/66 * 486/33 * 486-DX2/50 20 * * 486/25 10 * 386/33 * 386/20 386/25 * 5 * 386/16 80286/12 2 * 80286/10 * 1 8088/8 * 0.5 8088/5 0.2 * Year 1990 2000 79 81 82 83 84 87 88 89 91 92 93 94 95 96 97 98 99 01 02 03 04 05 1980 85 86 Integer performance grows Figure 1.2: Integer performance growth of Intel’s x86 processors
1. Key challenges facing main memories (3) Key challenges facing main memories • Increasing (single core) processor performance (the past) • Multicore/manycore processors with doubling core numbers in about every two years • (the presence and near future)
1. Key challenges facing main memories (4) Evolution of Intel’s process technology Shrinking: ~ 0.7/2 Years Figure: Evolution of Intel’s process technology [1]
1. Key challenges facing main memories (5) Figure: The actual rise of IC complexity in DRAMs and microprocessors [2] The evolution of IC complexity (Moore’s low)
1. Key challenges facing main memories (6) Figure: Rapid spreading of Intel’s multicore processors Rapid spreading of multicore processors in Intel’s processor portfolio
1. Key challenges facing main memories (7) The Cell BE (2006) SPE: Synergistic Procesing Element SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit LS: Local Store of 256 KB SMF: Synergistic Mem. Flow Unit EIB: Element Interface Bus PPE: Power Processing Element PPU: Power Processing Unit PXU: POWER Execution Unit MIC: Memory Interface Contr. BIC: Bus Interface Contr. XDR: Rambus DRAM Figure: Block diagram of the Cell BE [3]
1. Key challenges facing main memories (8) Assuming that the IC process technology will evolve in the near future at a similar rate as now (shrinking of characteristic feature sizes at a rate of ~ 0.7/2 years) the number of cores will double also about every two years.
1. Key challenges facing main memories (9) Higher processor performance/more cores Higher memory performancerequirements in terms of • larger memory size • higher memory bandwidth • lower memory latency
1. Key challenges facing main memories (10) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performancerequirements in terms of • larger memory size • higher memory bandwidth • lower memory latency
1. Key challenges facing main memories (11) Interesting research area Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performancerequirements in terms of • larger memory size • higher memory bandwidth • lower memory latency
1. Key challenges facing main memories (12) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performancerequirements in terms of • larger memory size • higher memory bandwidth • lower memory latency Limitations of recent implementations
1. Key challenges facing main memories (13) Higher processor performance/more cores Depends on • characteristics of the application • cache architecture • ... Higher memory performancerequirements in terms of • larger memory size • higher memory bandwidth • lower memory latency Limitations of recent implementations
2. Main limiters of increasing the transfer rate of main memories - Overview
2. The transfer rate of main memories (1) DRAM device Memory Cell Array I/O Buffers Memory controller Main components of the main memory Figure: Main components of the main memory
2. The transfer rate of main memories (2) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of sourcing/sinking data from/to the memory array, • (problem of reducing the Column Cycle Time of the memory cell array) DRAM device Memory Cell Array I/O Buffers Memory controller Sourcing/Sinking Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (3) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of transmitting data between memory controller and memory modules • (transmission line termination problem), DRAM device Memory Cell Array I/O Buffers Memory controller Sourcing/Sinking Transfering Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (4) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of capturing data in the memory controller/memory module. • (signaling and synchronization problem). DRAM device Memory Cell Array I/O Buffers Memory controller Sourcing/Sinking Transfering Capturing Capturing Figure: Schematic view of the structure of the main memory
2. The transfer rate of main memories (5) Main limitations of recent commodity DRAMs (sychronous main memories) in increasing transfer rates • The rate of sourcing/sinking data from/to the memory array, • (problem of reducing the Column Cycle Time of the memory cell array) • The rate of transmitting data between memory controller and memory modules • (transmission line termination problem), • The rate of capturing data at the memory controller/memory module. • (signaling and synchronization problem). The most serious limitation constrains the achievable transfer rate.
3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array 3. Challenges in increasing the rate of sourcing/sinking data from/to the memory cell array
3. The rate of sourcing/sinking data (1) Basic operation speed of recent sychronous DRAMs The memory cell array sources/sinks data to/from the I/O buffers at a rate of T (at a data width of x4/x8/x16). T = 1/tCCD x FW with tCCD: Min. column cycle time of the memory cell array FW: Fetch width of the memory cell array
3. The rate of sourcing/sinking data (2) The min. column cycle time (tCCD) of the memory cell array tCCD(Core column delay) is the min. time interval between consecutive Reads or Writes. Figure: The interpretation of tCCD [4] Remark tCCD is designated also as the Read/Write command to Read/Write command delay
3. The rate of sourcing/sinking data (3) ns Figure: The evolution of the column cycle time (tCCD) in different SDRAM types (ns) [5] Note: The min. column cycle time (tCCD) of synchronous DRAMs is: SDRAM: 7.5 ns DDR/2/3 5 ns
3. The rate of sourcing/sinking data (4) The fetch width (FW) of the memory cell array specifies how many times more bits the cell array fetchesper column cycle then the data widthof the device. E.g. an x4 DRAM chip with a fetch width of 4 (actually a DDR2 DRAM) fetches 4 × 4 that is 16 bits from the memory cell array per column cycle. The fetch width (FW) of the memory cell array of synchronous DRAMs is typically: DRAM type FW SDRAM: 1 DDR: 2 DDR2: 4 DDR3: 8
3. The rate of sourcing/sinking data (5) Clock frequency (fCK)100 MHz DRAM core frequency100 MHz Clock (CK) 100 MHz E.g. Memory CellArray I/OBuffers fCK fCK SDRAM Data transfer on the rising edges of CK over the data lines (DQ0 - DQn-1) 100 MT/s SDRAM-100 n bits n bits Data Strobe (DQS) 100 MHz DRAM core clock100 MHz Clock (CK/CK#)100 MHz E.g. Memory CellArray I/OBuffers DDRSDRAM fCK 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 200 MT/s DDR-200 n bits 2xn bits DRAM core clock100 MHz Data Strobe (DQS) 200 MHz Clock (CK/CK#)200 MHz E.g. Memory CellArray I/OBuffers fCK/2 DDR2SDRAM 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 400 MT/s DDR2-400 n bits 4xn bits Data Strobe (DQS) 400 MHz DRAM core clock100 MHz Clock (CK/CK#)400 MHz E.g. fCK/4 Memory CellArray I/OBuffers DDR3SDRAM 2 x fCK Data transfer on both edges of DQS over the data lines (DQ0 - DQn-1) 800 MT/s DDR3-800 n bits Figure: Fetch width of synchronous DRAM generations 8xn bits
3. The rate of sourcing/sinking data (6) According to Tmax = 1/tCCD x FW The peak rates of sourcing/sinking data to/from the I/O buffers are: SDRAM: 1/7.5 x 1 = 133 MT/s DDR: 1/5 X 2 = 400 MT/s DDR2: 1/5 x 4 = 800 MT/s DDR3: 1/5 x 8 = 1600 MT/s (not yet achived) The main limitation in increasing the rates of sourcing/sinking data from/to the memory array is TCCD (Column Cycle Time). The column cycle time TCCD) resulting from a DRAM design depends on a number of architectural choiches, like column decoder layout, array block size, array partitioning, decisions to share resources between array banks etc. [32]. Its reduction below 5 ns is an intricate circuit design task, that is out of scope of our discussion. For an insight into the subject see [32]. Remark GDDR3 and GDDR4 devices, with peak transfer rates of 1.6 and 2.5 GT/s, respectively, achive min. column cycle times (TCCD) of 2.5 and 3.2 ns, respectively [32].
4. Challenges in increasing the transfer rate between the memory controller and the DRAM parts
4. The transfer rate between the MC and the DRAM parts (1) The dataway connecting the memory controller and the DRAM chips Memory modules Memory controller Motherboard trace Figure: The dataway connecting the memory controller and the DRAM chips (based on [6])
4. The transfer rate between the MC and the DRAM parts (2) The dataway connecting the memory controller and the DRAM chips Memory modules For higher data rates PCB traces behave as transmission lines Memory controller Motherboard trace Figure: The dataway connecting the memory controller and the DRAM chips (based on [6])
4. The transfer rate between the MC and the DRAM parts (3) Basic behaviour of transmission lines (TL) TL Driver Receiver Principle of operation • A signal front given at the input of the TL travels down the TL from the driver side • to the receiver side. • Arriving at the receiver side the signal becomes reflected back to the driver side, then • at the driver side, the signal will be reflected again toward the receiver side etc.
4. The transfer rate between the MC and the DRAM parts (4) Transmission lines (TL) PC board traces (microstrips) behaves over ~ 100 MT/s like transmission lines with • a characteristic impedance (ZO) • and trace velocity
4. The transfer rate between the MC and the DRAM parts (5) Characteristic impedance of PCB traces (ZO) [7] Table: Typical characteristic impedance values of PCB traces [8]
4. The transfer rate between the MC and the DRAM parts (6) Trace velocity Table: Typical trace velocity values of PCB traces [8] Remark With 1 ft = 30.48 cm, the equivalent values in cm/ns are: 1.6 ns/ft equals ~ 19 cm/ns 2.0 ns/ft equals ~ 15 cm/ns 2.2 ns/ft equals ~ 14 cm/ns
4. The transfer rate between the MC and the DRAM parts (7) Behaviour of an ideal TL Ideal TL: no attenuation, no capacitive or inductive loading. VrD(t) VrR(t) ZO TL ZD T VR(t) ZT VO (t) VD(t) Driver Receiver With VO(t): Generator voltage VD(t): Voltage at the driver output VrD(t): Reflected voltage at the driver VR(t): Voltage at the receiver VrR(t): Reflected voltage at the receiver ZD: Internal impedance of the driver ZO: Charateristic impedance of the TL ZT: Impedance terminaling the TL T: Flight-time over TL Figure: Equivalent circuit of an ideal transmission line, (neglecting attenuation along the TL and capacitive as well as inductive loading of the TL)
4. The transfer rate between the MC and the DRAM parts (8) ZO ZO + ZD ZT – ZO ZT + ZO Characteristic equations describing the reflections and driver/receiver side voltages (based on [9]) At t = 0 VO(t=0) = VO VD(t=0) = VD(0) = VO Driver side: VrD(t=0) = VD(t=0) At t = T (T: propagation time across the TL) VR(nT) = VD((n-1)T)*(1+rR) Receiver side: rR = where VrR(nT) = VD((n-1)T)*rR
4. The transfer rate between the MC and the DRAM parts (9) ZO ZO + ZD ZD – ZO ZD + ZO Characteristic equations (cont.) At t = nT (n>1) Driver side VD((n+1)T) = VD((n-1)*T)+VrR(nT)*(1+rD) where: rD = VrD((n+1)T) = VrR(nT)*rD Receiver side VR(nT) =VR((n-2)T) + VrD((n-1)T)*(1+rR) VrR(nT) = VrD((n-1)T)*rR At t ∞ (Steady state) Receiver side VR(t∞) = VO
4. The transfer rate between the MC and the DRAM parts (10) VrD (t) ZO = 50 Ω VrR (t) ZD = 25 Ω TL ZD VR(t) VO (t=0) = 2V VO(t) ZT ZT >> ZO VD(t) Driver Receiver Example 1: Open ended ideal TL Figure: Equivalent circuit of an open ended ideal TL
4. The transfer rate between the MC and the DRAM parts (11) VD(t) VD(t) VR(t) 1.0 2.0 1.0 2.0 VR(t) 1.333 1T 1.333 T 1.33 1.333 2T 2.666 2.67 2T -0.444 3T 2.222 2.22 3T -0.444 4T 1.778 1.78 4T 0.148 5T 1.926 1.93 5T 0.148 6T 2.074 2.07 6T -0.049 7T 2.025 7T 2.02 -0.049 8T 1.976 1.98 8T 0.002 9T Driver side Receiver side Figure: Ladder diagram and VD(t), VR(t) waveforms of an open ended ideal TL (based on [6])
4. The transfer rate between the MC and the DRAM parts (12) D: Driver R: Receiver O: Output I: Input Figure: Open ended real TL (diiferential connection) [10] Reflections at both ends (R-end, D-end)
4. The transfer rate between the MC and the DRAM parts (13) Reflections Figure: Reflections shown on a eye diagram due to termination mismatch [11]
4. The transfer rate between the MC and the DRAM parts (14) Implications of the reflections on a TL • When a data signal is given at the driver side of the TL, a signal wavefront travels down • the TL and will be ping-ponged between both ends of the TL until the steady state • condition is reached. • But until the signal becomes at least nearly settled no further wavefront can be given to • the TL else inter symbol interferences (ISI) arise. Reflections limit the max. data transfer rate of a TL.
4. The transfer rate between the MC and the DRAM parts (15) The max. data transfer rate is limited primarily by the time until the signal settles, that is, it depends both on • the number of signal round trips until the signal settles, and • the length of the TL. Example Open ended TL of the length of 10 cm Assumptions: • Signal velocity on the TL is 20 cm/ns. • Reflections settle to an acceptable level after three roundtrips (6T). T = 0.5 ns Then the wavefront of a signal settles nearly after 6×0.5 ns = 3 ns. ½ of the min. cycle time is 3 ns, the min. cycle time is 6 ns, the max. transfer rate of the above open ended TL is ~ 166 MHz
4. The transfer rate between the MC and the DRAM parts (16) Open ended TLs may be used only for • relative low transfer rates (up to ~ 100 MHz), that is up to SDRAM devices, and • short distances (up to ~ 10 cm). For higher transfer rates or longer distances the TL needs to be terminated by its characteristic impedance Z0.
4. The transfer rate between the MC and the DRAM parts (17) Reducing reflections by a series resistor A series resistor put before the TL reduces reflections Improved signal integrity, higher transfer rates
4. The transfer rate between the MC and the DRAM parts (18) Example 2: Using series resistors to reduce reflections Figure: Equivalent circuit of an open ended TL with a series resistor (R3 in the figure) included between the driver and the TL (Micro-Cap 9.0.5.0)
4. The transfer rate between the MC and the DRAM parts (19) R3: R3 = 0 Ώ R3 = 25 Ώ Figure: Driver (Vout) and Reciever (Vin) voltages of an open ended TL with a series resistor R3 The value of R3 is modified from 0 to 25 Ohm
4. The transfer rate between the MC and the DRAM parts (20) SDR DIMM SDR DIMM Memory Contr. LVTTL Comm., Contr. Addr. RS RS DQ, DQS DM Slot 1 Slot 2 Figure: Series resistors on an SDRAM module inserted into the DQ, DQS, DM lines (Rs = 10 or 22 Ω)