360 likes | 809 Views
Digital Integrated Circuits Chpt. 12. Decreasing Word Line Delay. Drive ... Digital Integrated Circuits Chpt. 12. Pulsed Word Line Feedback Signal ...
E N D
ECE 300Advanced VLSI DesignFall 2006 Lecture 19: Memories Yunsi Fei [Adapted from Jan Rabaey et al’s Digital Integrated Circuits©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]
Review: Basic Building Blocks • Datapath • Execution units • Adder, multiplier, divider, shifter, etc. • Register file and pipeline registers • Multiplexers, decoders • Control • Finite state machines (PLA, ROM, random logic) • Interconnect • Switches, arbiters, buses • Memory • Caches (SRAMs), TLBs, DRAMs, buffers
A Typical Memory Hierarchy • By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Instr Cache Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Data Cache RegFile DTLB Speed (ns): .1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest
1D Memory Architecture m bits m bits S0 S0 Word 0 Word 0 S1 S1 Word 1 Word 1 S2 S2 Word 2 Storage Cell A0 Word 2 Storage Cell S3 S3 A1 Decoder n words Ak-1 Sn-2 Sn-2 Word n-2 Word n-2 Sn-1 Sn-1 Word n-1 Word n-1 Input/Output Input/Output n words n select signals Decoder reduces # of inputs k = log2 n
2D Memory Architecture bit line 2k-j word line Aj Aj+1 Row Address storage (RAM) cell Row Decoder Ak-1 m∙2j Column Address A0 selects appropriate word from memory row A1 Column Decoder Aj-1 Sense Amplifiers amplifies bit line swing Read/Write Circuits Input/Output (m bits)
3D Memory Architecture Row Addr Column Addr Block Addr Input/Output (m bits) Advantages: 1. Shorter word and/or bit lines 2. Block addr activates only 1 block saving power
Precharged MOS NOR ROM Vdd 0 1 precharge WL(0) GND 0 1 WL(1) WL(2) GND WL(3) BL(0) BL(1) BL(2) BL(3) 1 1 1 1 0 1 1 0
MOS NOR ROM Layout Metal1 on top of diffusion WL(0) GND (diffusion) WL(1) Basic cell 10 x 7 Metal1 Polysilicon WL(2) GND (diffusion) WL(3) BL(0) BL(1) BL(2) BL(3) Only 1 layer (contact mask) is used to program memory array, so programming of the ROM can be delayed to one of the last process steps.
Transient Model for NOR ROM precharge metal1 rword BL poly Cbit WL cword Word line parasitics Resistance/cell: 35 Wire capacitance/cell: 0.65 fF Gate capacitance/cell: 5.10 fF Bit line parasitics Resistance/cell: 0.15 Wire capacitance/cell: 0.83 fF Drain capacitance/cell: 2.60 fF
Propagation Delay of NOR ROM • Word line delay • Delay of a distributed rc-line containing M cells tword = 0.38(rword x cword) M2 = 20 nsec for M = 512 • Bit line delay • Assuming min size pull-down and 3*min size pull-up with reduced swing bit lines (5V to 2.5V) Cbit = 1.7 pF and IavHL = 0.36 mA so tHL = tLH = 5.9 nsec
Read-Write Memories (RAMs) • Static – SRAM • data is stored as long as supply is applied • large cells (6 fets/cell) – so fewer bits/chip • fast – so used where speed is important (e.g., caches) • differential outputs (output BL and !BL) • use sense amps for performance • compatible with CMOS technology • Dynamic – DRAM • periodic refresh required • small cells (1 to 3 fets/cell) – so more bits/chip • slower – so used for main memories • single ended output (output BL only) • need sense amps for correct operation • not typically compatible with CMOS technology
Memory Timing Definitions Read Cycle Read Read Access Read Access Write Cycle Write Write Hold Write Setup Data Data Valid
4x4 SRAM Memory read precharge 2 bit words bit line precharge enable WL[0] !BL BL A1 WL[1] Row Decoder A2 WL[2] WL[3] Column Decoder A0 clocking and control sense amplifiers write circuitry BL[i] BL[I+1]
2D Memory Configuration Sense Amps Sense Amps Row Decoder
Decreasing Word Line Delay driver driver polysilicon word line WL metal word line polysilicon word line WL metal bypass • Drive the word line from both sides • Use a metal bypass • Use silicides
Decreasing Bit Line Delay (and Energy) • Reduce the bit line voltage swing • need sense amp for each column to sense/restore signal • Isolate memory cells from the bit lines after sensing (to prevent the cells from changing the bit line voltage further) - pulsed word line • generation of word line pulses very critical • too short - sense amp operation may fail • too long - power efficiency degraded (because bit line swing size depends on duration of the word line pulse) • use feedback signal from bit lines • Isolate sense amps from bit lines after sensing (to prevent bit lines from having large voltage swings) - bit line isolation
Pulsed Word Line Feedback Signal Read Word line • Dummy column • height set to 10% of a regular column and its cells are tied to a fixed value • capacitance is only 10% of a regular column Bit lines Dummy bit lines Complete 10% populated
Pulsed Word Line Timing • Dummy bit lines have reached full swing and trigger pulse shut off when regular bit lines reach 10% swing Read Complete Word line V = 0.1Vdd Bit line V = Vdd Dummy bit line
Bit Line Isolation bit lines V = 0.1Vdd isolate Read sense amplifier sense V = Vdd sense amplifier outputs
6-transistor SRAM Cell WL M2 M4 Q M6 M5 !Q M1 M3 !BL BL
SRAM Cell Analysis (Read) WL=1 M4 M6 !Q=0 M5 Q=1 M1 Cbit Cbit !BL=1 BL=1 Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints
SRAM Cell Analysis (Read) WL=1 M4 M6 !Q=0 M5 Q=1 M1 Cbit Cbit !BL=1 BL=1 Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5) V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR)
Read Voltages Ratios Vdd = 2.5V VTn = 0.5V
SRAM Cell Analysis (Write) WL=1 M4 M6 !Q=0 Q=1 M5 M1 !BL=1 BL=0 Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6) VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2)
Write Voltages Ratios Vdd = 2.5V |VTp| = 0.5V p/n = 0.5
Cell Sizing • Keeping cell size minimized is critical for large caches • Minimum sized pull down fets (M1 and M3) • Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR • But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle • Minimum width and length pass transistors • Boost the width of the pull downs (M1 and M3) • Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
6T-SRAM Layout VDD M2 M4 Q Q M1 M3 GND M5 M6 WL BL BL
Multiple Read/Write Port Cell WL2 WL1 M2 M4 M5 !Q Q M6 M7 M8 M1 M3 !BL2 !BL1 BL1 BL2
4x4 DRAM Memory read precharge 2 bit words bit line precharge enable WL[0] BL A1 WL[1] Row Decoder A2 WL[2] WL[3] sense amplifiers clocking, control, and refresh BL[0] BL[1] BL[2] BL[3] write circuitry A0 Column Decoder
3-Transistor DRAM Cell write WWL Vdd BL1 X Vdd-Vt RWL read Vdd-Vt BL2 V WWL RWL M3 X M1 M2 Cs BL2 BL1 No constraints on device sizes (ratioless) Reads are non-destructive Value stored at node X when writing a “1” is VWWL - Vtn
3T-DRAM Layout BL2 BL1 GND RWL M3 M2 WWL M1
1-Transistor DRAM Cell WL write “1” read “1” WL X M1 X Vdd-Vt Cs CBL Vdd BL Vdd/2 BL sensing Write: Cs is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between CBL and Cs Read is destructive, so must refresh after read
DRAM Cell Observations • DRAM memory cells are single ended (complicates the design of the sense amp) • 1T cell requires a sense amp for each bit line due to charge redistribution read • 1T cell read is destructive; refresh must follow to restore data • 1T cell requires an extra capacitor that must be explicitly included in the design • A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than Vdd)