Digital Integrated Circuits A Design Perspective

Digital Integrated CircuitsA Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Semiconductor Memories December 20, 2002

Chapter Overview • Memory Classification • Memory Architectures • The Memory Core • Periphery • Reliability • Case Studies

Memory Classification • Size • Speed (timing) • Technology (CMOS, bipolar) • Function (ROM, RWM) • Volatility • Access (random, serial) • Application

Semiconductor Memory Classification Non-Volatile Read-WriteMemory Read-Write Memory Read-Only Memory Random Non-Random EPROM Mask-Programmed Access Access 2 E PROM Programmable (PROM) FLASH FIFO SRAM LIFO DRAM Shift Register CAM

Memory Timing: Definitions

Memory Timing: Definitions • Read-access time: delay between the read request and the moment the data is available at the output. • Write-access time: delay between the write request and the writing of the input data into the memory. • Cycle time: the minimum time required between successive reads (or writes).

A Typical Memory Hierarchy • By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Instr Cache Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Data Cache RegFile DTLB Speed (ns): .1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest

Memory Architecture • A memory of N words, each one M bits wide, needs one select bit to identify the word. • If the number of words is high (f.i. N=106 or N=220) we need too many select lines. • A memory word is selected from a decoder where encode addresses enter: N=2k (in the exemple k=20).

Decoder reduces the number of select signals K = log N 2 Memory Architecture: Decoders M bits M bits S S 0 0 Word 0 Word 0 S 1 Word 1 Word 1 A 0 S Storage Storage 2 Word 2 Word 2 A cell cell 1 words A N K 1 Decoder - S N 2 - - Word N 2 Word N 2 - S N 1 - - Word N 1 Word N 1 - K log N = 2 Input-Output Input-Output ( M bits) ( M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals

Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH Amplify swing to rail-to-rail amplitude Selects appropriate word

Array-Structured Memory Architecture • The select line that enables a sigle row of cells is the word line. • The line that connects the cells in a single column to the I/O circuits is the bit line. • Reduced swing requires sense amplifiers.

Hierarchical Memory Architecture Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings

Clock -address -address Z X generator buffer buffer Predecoder and block selector Bit line load Transfer gate Column decoder Sense amplifier and write driver CS, WE I/O x1/x4 -address -address Y X buffer buffer controller buffer buffer Block Diagram of 4 Mbit SRAM 128 K Array Block 0 Subglobal row decoder Subglobal row decoder Global row decoder Block 30 Block 31 Block 1 Local row decoder [Hirose90]

Contents-Addressable Memory I/O Buffers I/O Buffers Commands Commands Validity Bits 9 2 Priority Encoder Validity Bits Address Decoder 9 2 Priority Encoder Address Decoder

DRAM Timing SRAM Timing Self-timed Multiplexed Adressing Memory Timing: Approaches

Read-Only Memory Cells BL BL BL VDD WL WL WL 1 BL BL BL WL WL WL 0 GND Diode ROM MOS ROM 1 MOS ROM 2

MOS OR ROM BL [0] BL [1] BL [2] BL [3] WL [0] V DD WL [1] WL [2] V DD WL [3] V bias Pull-down loads

MOS NOR ROM V DD Pull-up devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3]

MOS NOR ROM Layout Cell (9.5l x 7l) Programmming using the Active Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion

MOS NOR ROM Layout Cell (11l x 7l) Programmming using the Contact Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion

MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row

No contact to VDD or GND necessary; drastically reduced cell size Loss in performance compared to NOR ROM MOS NAND ROM Layout Cell (8l x 7l) Programmming using the Metal-1 Layer Only Polysilicon Diffusion Metal1 on Diffusion

NAND ROM Layout Cell (5l x 6l) Programmming using Implants Only Polysilicon Threshold-alteringimplant Metal1 on Diffusion

V DD BL r word WL C bit c word Equivalent Transient Model for MOS NOR ROM Model for NOR ROM • Word line parasitics • Wire capacitance and gate capacitance • Wire resistance (polysilicon) • Bit line parasitics • Resistance not dominant (metal) • Drain and Gate-Drain capacitance

Equivalent Transient Model for MOS NAND ROM V DD Model for NAND ROM BL C L r bit c bit r word WL c word • Word line parasitics • Similar to NOR ROM • Bit line parasitics • Resistance of cascaded transistors dominates • Drain/Source and complete gate capacitance

Decreasing Word Line Delay

Precharged MOS NOR ROM V f DD pre Precharge devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design.

D G S Non-Volatile MemoriesThe Floating-gate transistor (FAMOS) Floating gate Gate Source Drain t ox t ox + +_ n n p Substrate Schematic symbol Device cross-section

20 V 0 V 5 V 20 V 0 V 5 V 10 V 5 V 5 V 2.5 V 2 2 S D S D S D Avalanche injection Removing programming voltage leaves charge trapped Programming results in higher V . T Floating-Gate Transistor Programming

A “Programmable-Threshold” Transistor

FLOTOX EEPROM Gate Floating gate I Drain Source V 20 – 30 nm -10 V GD 10 V 1 1 n n Substrate p 10 nm Fowler-Nordheim I-V characteristic FLOTOX transistor

V DD EEPROM Cell BL WL Absolute threshold control is hard Unprogrammed transistor might be depletion  2 transistor cell

Flash EEPROM Control gate Floating gate erasure Thin tunneling oxide 1 1 n source n drain programming p- substrate Many other options …

Cross-sections of NVM cells Flash EPROM Courtesy Intel

Basic Operations in a NOR Flash Memory―Erase

Basic Operations in a NOR Flash Memory―Write

Basic Operations in a NOR Flash Memory―Read

NAND Flash Memory Word line(poly) Unit Cell Source line (Diff. Layer) Courtesy Toshiba

Select transistor Word lines Active area STI Bit line contact Source line contact NAND Flash Memory Courtesy Toshiba

Characteristics of State-of-the-art NVM

Read-Write Memories (RAM) • STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential • DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

Q 6-transistor CMOS SRAM Cell WL V DD M M 2 4 Q M M 6 5 M M 1 3 BL BL

CMOS SRAM Analysis (Read Q=1) • Both bit lines are precharged high. • The word line is actvated: M5 and M6 are ON. • The read operation consists of leaving BL=1 and of discharging BL through M1- M5. • The resistance of M5 must be larger than that of M1 to avoid the change of state of the cell (read disturb). • Alternatively, the bit lines can be precharged at VDD/2.

CMOS SRAM Analysis (Read) WL V DD M BL 4 BL Q 0 = M Q 1 = 6 M 5 V V V M DD DD DD 1 C C bit bit

CMOS SRAM Analysis (Read) 1.2 1 0.8 0.6 Voltage Rise (V) 0.4 0.2 Voltage rise [V] 0 0 0.5 1 1.2 1.5 2 2.5 3 Cell Ratio (CR)

CMOS SRAM Analysis (Write Q=0) • When Q=1, a 0 is written in the cell by setting BL=0 e BL=1. • Due to the read-disturb condition the data has to be written through M6. • Q must become lower than the threshold of M1.

WL V DD M 4 M Q 0 = 6 M 5 Q 1 = M 1 V DD BL 1 BL 0 = = CMOS SRAM Analysis (Write)

CMOS SRAM Analysis (Write) PR=W4/W6

Cell Sizing • Keeping cell size minimized is critical for large caches • Minimum sized pull down fets (M1 and M3) • Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR • But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle • Minimum width and length pass transistors • Boost the width of the pull downs (M1 and M3) • Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger

VDD M2 M4 Q Q M1 M3 GND M5 M6 WL BL BL 6T-SRAM — Layout

Digital Integrated Circuits A Design Perspective