1 / 74

Array Structured Memories

Explore the array structures of memories like STMicro and Intel, understanding their design features and comparing array architectures. Learn about hierarchical memory organization, key design issues, SRAM memory architecture, read and write timings, SRAM cell design principles, operating principles, static noise margins, and SRAM bitcell layouts. Discover the complexities of memory array optimization and the stability requirements for SRAM bit cells in this comprehensive guide.

pridgen
Download Presentation

Array Structured Memories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Array Structured Memories STMicro/Intel UCSD CAD LAB Weste Text

  2. Memory Arrays

  3. Feature Comparison Between Memory Types

  4. Array Architecture • 2nwords of 2mbits each • If n >> m, fold by 2k into fewer rows of more columns • Good regularity – easy to design • Very high density if good cells are used

  5. Array of N x K words ------------- columns ------------ KxM S0 Row Decoder C of M bit words row 0 C of M bit words row 1 Log2R Address Lines C of M bit words row 2 ------------- rows R------------ C of M bit words row N-2 SR-1 C of M bit words row N-1 - - - - KxM bits - - - - Log2C Address Lines Column Select M bit data word Memory - Real Organization

  6. Hierarchical Memory Architecture

  7. Array Organization Design Issues • aspect ratio should be relative square • Row / Column organisation (matrix) • R = log2(N_rows); C = log2(N_columns) • R + C = N (N_address_bits) • number of rows should be power of 2 • number of bits in a row need not be… • sense amplifiers to speed voltage swing • 1 -> 2R row decoder • 1 -> 2C column decoder • M column decoders (M bits, one per bit) • M = output word width

  8. Simple 4x4 SRAM Memory read precharge bit line precharge enable WL[0] BL !BL A1 2 bit width: M=2 R = 2 => N_rows = 2R = 4 C = 1 N_columns = 2c x M = 4 N = R + C = 3 Array size = N_rows x N_columns = 16 WL[1] Row Decoder A2 WL[2] WL[3] A0 Column Decoder A0! clocking and control -> sense amplifiers write circuitry WE! , OE!

  9. SRAM Read Timing (typical) • tAA (access time for address): time for stable output after a change in address. • tACS (access time for chip select): time for stable output after CS is asserted. • tOE (output enable time): time for low impedance when OE and CS are both asserted. • tOZ (output-disable time): time to high-impedance state when OE or CS are negated. • tOH (output-hold time): time data remains valid after a change to the address inputs.

  10.  tAA Max(tAA, tACS) tOE tOZ tACS tAA tOH tOZ SRAM Read Timing (typical) ADDR stable stable stable CS_L OE_L tOE DOUT valid valid valid WE_L = HIGH

  11. tOH tAA tACS tOZ tOE SRAM Architecture and Read Timings

  12. SRAM write cycle timing ~WE controlled ~CS controlled

  13. tDH Setup time = tDW tWP-tDW SRAM Architecture and Write Timings Write driver

  14. SRAM Cell Design • Memory arrays are large • Need to optimize cell design for area and performance • Peripheral circuits can be complex • 60-80% area in array, 20-40% in periphery • Classical Memory cell design • 6T cell full CMOS • 4T cell with high resistance poly load • TFT load cell

  15. Anatomy of the SRAM Cell • Write: • set bit lines to new data value • b’ = ~b • raise word line to “high” • sets cell to new state • Low impedance bit-lines • Read: • set bit lines high • set word line high • see which bit line goes low • High impedance bit lines

  16. SRAM Cell Operating Principle • Inverter Amplifies • Negative gain • Slope < –1 in middle • Saturates at ends • Inverter Pair Amplifies • Positive gain • Slope > 1 in middle • Saturates at ends

  17. Ball on Ramp Analogy Bistable Element Stability § Require Vin = V2 § Stable at endpoints recover from pertubation § Metastable in middle Fall out when perturbed

  18. Cell Static Noise Margin • Cell state may be disturbed by • DC • Layout pattern offset • Process mismatches • non-uniformity of implantation • gate pattern size errors • AC • Alpha particles • Crosstalk • Voltage supply ripple • Thermal noise SNM (static noise margin) = Maximum Value of Vn not flipping cell state

  19. 1 SNM 2 2 SNM 1 1 1 2 2 SNM: Butterfly Curves

  20. SNM for Poly Load Cell

  21. 12T SRAM Cell • Basic building block: SRAM Cell • 1-bit/cell (noise margin again) • 12-transistor (12T) SRAM cell • Latch with TM-gate write • Separately buffered read

  22. 6T SRAM Cell • Cell size accounts for most of array size • Reduce cell size at cost of complexity/margins • 6T SRAM Cell • Read: • Precharge bit, bit_b • Raise wordline • Write: • Drive data onto bit, bit_b • Raise wordline

  23. SRAM Design TI 65nm: 0.46x1.06um2 IBM 65nm: 0.41x1.25um2 Intel 65nm: 0.46x1.24um2 * Figures courtesy A. Chatterjee et al., P. Bai et al., and Z. Luo et al., Int. Electron Device Meeting Tech. Digest, 2004

  24. Vertical 6T Cell Layout B- B+ N Well Connection VDD PMOS Pull Up Q/ Q NMOS Pull Down GND SEL SEL MOSFET Substrate Connection

  25. SRAM Bitcell Design Requirements of SRAM bitcell design Stable read operation: Do not disturb data when reading Stable write operation: Must write data within a specified time Stable data retention: Data should not be lost Typical transistor sizing Cell ratio (= I(PD) / I(PG)) = 1.5 ~ 2.5 Pull-up ratio (= I(PU) / I(PG)) = 0.5 VDD PU2 PU2 PU1 PU1 NL PG2 PG2 NR PG1 PG1 PD2 PD2 PD1 PD1 VSS BLB BL WL VSS VDD BL PU2 PD2 PG1 PD1 PU1 PG2 WL NL NL NR NR WL Schematic Micrograph Layout BLB VDD VSS

  26. Detailed SRAM Bitcell Layout Vertical: 2 poly pitch Horizontal: 5 contact pitch Poly-to-contact space > overlay + spacer + strain_layer + CD_control (6.4nm*) ( 8nm**) (10nm**) ( 2.6nm*) = 27nm 1 poly pitch = 2 poly_to_contact + poly_width + contact_width  54 + 32 + 45** = 131 nm A pitch is a multiple of a drawing grid for fine-grain pattern placement Ex.: 5 grid per pitch  drawing grid = (131/5) = 26 nm Ex.: 6 grid per pitch  drawing grid = (131/6) = 22 nm * From ITRS 32nm tech. ** From S. Verhaegen et al., SPIE Adv. Litho., 2008 poly CNT spacer Strain layer

  27. SRAM Read • Precharge both bitlines high • Then turn on wordline • One of the two bitlines will • be pulled down by the cell • Ex: A = 0, A_b = 1 • bit discharges, bit_b stays high • But A bumps up slightly • Read stability • A must not flip • N1 >> N2

  28. SRAM Read, 0 is stored in the cell

  29. SRAM Write • Drive one bitline high, other low • Then turn on wordline • Bitlines overpower cell • Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 • Force A_b low, then A rises high • Writability • Must overpower feedback • P2 << N4 to force A_b low, • N1 turns off, P1 turns on, • raise A high as desired

  30. SRAM Sizing • High bitlines must not overpower inverters during reads • But low bitlines must write new value into cell

  31. SRAM Column Example read write

  32. Decoders • n:2n decoder consists of 2n n-input AND gates • One needed for each row of memory • Build AND from NAND or NOR gate choose minimum size to reduce load on the address lines Pseudo-nMOS static

  33. Single Pass-Gate Mux bitlines propagate through 1 transistor

  34. Decoder Layout • Decoders must be pitch-matched to SRAM cell • Requires very skinny gates

  35. Large Decoders • For n > 4, NAND gates become slow • Break large gates into multiple smaller gates

  36. Predecoding • Many of these gates are redundant • Factor out common gates into predecoder • Saves area • Same path effort

  37. Column Circuitry • Some circuitry is required for each column • Bitline conditioning • Sense amplifiers • Column multiplexing • Each column must have write drivers and read sensing circuits

  38. Column Multiplexing • Recall that array may be folded for good aspect ratio • Ex: 2k word x 16 folded into 256 rows x 128 columns • Must select 16 output bits from the 128 columns • Requires 16 8:1 column multiplexers

  39. Typical Column Access

  40. Pass Transistor Based Column Decoder BL3 !BL3 BL2 !BL2 BL1 !BL1 BL0 !BL0 S3 A1 S2 2 input NOR decoder S1 A0 S0 Data !Data • Advantage: speed since there is only one extra transistor in the signal path • Disadvantage: large transistor count

  41. Tree Decoder Mux • Column MUX can use pass transistors • Use nMOS only, precharge outputs • One design is to use k series transistors for 2k:1 mux • No external decoder logic needed

  42. Ex: 2-way Muxed SRAM 2-to-1 mux two bits from two cells and selected by A0

  43. Bitline Conditioning • Precharge bitlines high before reads • Equalize bitlines to minimize voltage difference when using sense amplifiers

  44. Sense Amplifier: Why? Cell pull down Xtor resistance • Bit line cap significant for large array • If each cell contributes 2fF, • for 256 cells, 512fF plus wire cap • Pull-down resistance is about 15K • RC = 7.5ns! (assuming DV = Vdd) • Cannot easily change R, C, or Vdd, but can change DV i.e. smallest sensed voltage • Can reliably sense DV as small as <50mV Cell current

  45. Sense Amplifiers • Bitlines have many cells attached • Ex: 32-kbit SRAM has 256 rows x 128 cols • 128 cells on each bitline • tpd (C/I) DV • Even with shared diffusion contacts, 64C of diffusion capacitance (big C) • Discharged slowly through small transistors (small I) • Sense amplifiers are triggered on small voltage swing (reduce DV)

  46. Differential Pair Amp • Differential pair requires no clock • But always dissipates static power

  47. Clocked Sense Amp • Clocked sense amp saves power • Requires sense_clk after enough bitline swing • Isolation transistors cut off large bitline capacitance

  48. bit BIT 200mV bit’ BIT’ 2.5V Sense Amp Waveforms 1ns / div wordline wordline begin precharging bit lines sense clk sense clk

  49. Write Driver Circuits

More Related