450 likes | 941 Views
Content Addressable Memories. Cell Design and Peripheral Circuits. Data In. 1. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 2. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 4. 3. 1. 0. 1. 1. 1. 1. 0. 1. 1.
E N D
Content Addressable Memories Cell Design and Peripheral Circuits
Data In 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 0 0 0 2 1 1 0 1 0 0 1 1 0 1 0 1 0 1 0 1 0 4 3 1 0 1 1 1 1 0 1 1 1 0 1 1 0 0 0 0 Address In 1 0 1 1 0 0 0 1 3 4 1 0 1 1 0 0 0 1 2 1 1 0 1 0 0 1 1 Address Out 1 0 1 1 0 0 0 1 5 1 1 1 0 1 1 0 0 3 1 0 1 1 0 0 0 1 4 1 0 1 1 1 0 0 0 5 1 1 1 0 0 0 1 1 1 0 1 1 0 0 0 1 Data Out CAM: Introduction • CAM vs. RAM
SL1c SL1 ML N5 N7 BL1_cell BL1c_cell N6 N8 P1 P2 N4 N3 N1 N2 BL1c BL1 WL CAM: Introduction • Binary CAM Cell • ML pre-charged to VDD • Match: ML remains at VDD • Mismatch: ML discharges
Input Keyword Input Keyword 1 0 1 1 0 X X X 1 0 1 1 0 0 0 1 1 1 0 1 0 1 0 X 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 0 Match Match 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 0 X 2 1 X 0 1 0 0 1 1 2 1 1 0 1 0 0 X X 4 4 3 1 0 1 1 1 0 0 0 3 1 0 1 1 1 X X X 1 0 1 1 0 1 0 1 1 Match 4 1 0 1 1 0 0 1 0 4 1 0 1 1 X X X X Match 5 1 1 1 0 0 X 0 0 5 1 1 1 X X X X X CAM: Introduction • Ternary CAM (TCAM)
Comparison Logic SL1 SL2 ML BL1c BL2c BL2 BL1 RAM Cell RAM Cell WL CAM: Introduction • TCAM Cell • Global Masking SLs • Local Masking BLs
SL2 SL1 ML N5 N7 BL1_cell BL2_cell N6 N8 N3 N4 BL2 BL1 WL CAM: Introduction • DRAM based TCAM Cell • Higher bit density • Slower table update • Expensive process • Refreshing circuitry • Scaling issues (Leakage)
SL1 SL2 ML BL1c_cell BL2c_cell BL1 BL1c BL2c BL2 WL CAM: Introduction • SRAM based TCAM Cell • Standard CMOS process • Fast table update • Large area (16T)
Search Lines (SLs) SL Drivers ML Sense Amplifiers SL1(0) SL2(0) SL1(143) SL2(143) MLSO(0) ML0 MLSA Match Lines (MLs) BL2c(N) BL1c(N) BL1c(0) BL2c(0) CAM Cell (143) CAM Cell (0) MLSO(255) ML255 MLSA BL1c(N) BL1c(0) BL2c(0) BL2c(N) CAM Cell (143) CAM Cell (0) CAM: Introduction • Block diagram of a 256 x 144 TCAM
CAM: Introduction • Why low-power TCAMs? • Parallel search Very high power (2Mb Sibercore TCAM 66MHz 66Msps 3.4W) • IPv6, OC-768 Larger word size, larger no. of entries High power • Embedded applications (SoC)
CAM: Introduction • Why high-performance TCAMs? • OC-768 135M packets/s (7.4 ns/packet) • Application complexity Multiple searches • IPv6 Larger word size larger search time
CAM: Design Techniques • Cell Design: 12T Static TCAM cell* • ‘0’ is retained by Leakage (VWL ~ 200 mV) • High density • Leakage (3 orders) • Noise margin • Soft-errors (node S) • Unsuitable for READ * I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
ML_NOR MM CAM Cell (N) CAM Cell (1) CAM Cell (0) CAM Cell (N) CAM Cell (1) CAM Cell (0) ML_NAND M SA SA NAND-type CAM NOR-type CAM SL1c SL1 SL1 SL1c BL1 VDD BL1c BL1 VDD BL1c WL WL CAM: Design Techniques • Cell Design: NAND vs. NOR Type CAM • Low Power • Charge-sharing • Slow
VDD MLSO VDD PRE ML MM MM CAM: Design Techniques • MLSA Design: Conventional • Pre-charge ML to VDD • Match VML = VDD • Mismatch VML = 0
VDD RSTc MLSO MLOFF ML RST MM MM MATCH Delay Dummy ML MLOFF CAM: Design Techniques • MLSA Design: Current Race Sensing* * I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
MLSO [0] ML [0] Voltage Margin ML [1] CAM: Design Techniques • MLSA Design: Current Race Sensing • No need to reset SLs in every clock cycle • Lower ML voltage swing (Vth + ∆V) ≈ ½VDD • Speed Current Voltage Margin
VDD VDD FAST_PRE IREF MLSO SP VREF MREF CSP ML CML RST RST CAM: Design Techniques • MLSA Design: Charge Redistribution* • Fast pre-charge ML through MREF • Mismatch SP=‘0’ MLSO=‘1’ • IML > IREF > leakage • ∆VML (VREF – Vth) • FAST_PRE High power * P. Vlasenko, D. Perry, MOSAID Technologies Inc., US Patent 6717876, April 6, 2004
VDD CHARGE_IN PRE OFFSET SA ML MLSO CINJ CML RST CAM: Design Techniques • MLSA Design: Charge Injection* • Reset ML and pre-charge CINJ • Charge share CINJ and CML • Match VML = CINJ x VDD/(CINJ +CML) • Mismatch VML = 0 • Small ∆VML • Poor noise margin • Area penalty (CINJ) * G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, SONY Corp., Proc. IEEE CICC, pp. 387-390, Sep. 2003
PRE-SEARCH MAIN-SEARCH MLSO1 MLSO2 ML2 ML2 MLSA2 ML1 MLSA1 MLSO1 MLSO2 MLSA2 ML1 MLSA1 CAM: Design Techniques • Low Power: Selective Pre-charge* • MLs: Two segments • If MATCH in pre-search Main-search • No. of bits in pre-search Data statistics * C. Zukowski and S. Wang, Proc. IEEE ISCAS, pp. 745-770, Jun. 9-12, 1997
SL1(N) SL2(N) SL1(0) SL2(0) MLSO1 ML1 ML1 MLSA1 ML2 ML2 MLSO2 MLSA2 BL1c(N) BL2c(N) BL1c(0) BL2c(0) CAM Cell (N) CAM Cell (0) CAM: Design Techniques • Low Power: Dual-ML TCAM* • MLSA1 is enabled first • MLSA2 is enabled if MLSO1 = ‘1’ * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques • Low Power: Dual-ML TCAM • Cap(ML1) = Cap(ML2) = ½ C(ML) • Same speed, 50% less energy (Ideally!) • Parasitic interconnects degrade both speed and energy • Additional ML increases coupling capacitance
CAM: Design Techniques • Low Power: Dual-ML TCAM • Simulation results (144 bits)* • Interconnect cap. = 27 fF • W/L = 0.6µm/0.18µm * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
ML1 SL1 BL1c CAM: Design Techniques • Low Power: Dual-ML TCAM* • EAVG = PML1 x E1 +(1 – PML1) x E2 • SA1 cannot detect Type I • For ‘M’ mismatches, PML1 = 1 – (0.5)M * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques • Low Power: Dual-ML TCAM* * N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques • Low Power: Hierarchical SLs* • 144 bits (5 segments: 8, 34, 34, 34, 34) • SLs Multiple blocks (64 words each) • ∆VGSL 0.45V (VDD=1.8V) • Logic complexity • Search time/latency • 64-bit OR gates * Pagiamtzis et. al., Proc. IEEE CICC, pp. 383-386, Sep. 2003
SL1 SL2 ML N9 N11 BL1c_cell BL2c_cell N10 N12 P1 P2 P3 P4 BL2c BL1 BL1c BL2 ‘1’ ‘0’ N3 ‘0’ N4 N7 N8 ‘0’ ‘1’ ‘0’ ‘1’ ‘1’ N1 N2 N5 N6 WL CAM: Design Techniques • Static Power Reduction • 16T TCAM: Leakage Paths* * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques • Static Power Reduction • Technology Scaling1 • Dimensions 30% • Dynamic power 50% • Leakage current 5x • Architectural level techniques2, 3 • A small portion is enabled • S. Borkar, IEEE Micro, pp. 23-29, Jul.-Aug. 1999 • K. Pagiamtzis, A. Sheikholeslami, Proc. IEEE CICC, pp. 383-386, Sep. 2003 • G. Kasai, Y. Takarabe, K. Furumi, M. Yoneda, Proc. IEEE CICC, pp. 387-390, Sep. 2003
VDD CAM: Design Techniques • Static Power Reduction • Leakage current* • VDD ISUB * R. X. Gu, M. I. Elmasry, IEEE JSSC, vol. 31, no. 5, pp. 707-713, May 1996
MLSO [0] ML [0] Voltage Margin ML [1] CAM: Design Techniques • Static Power Reduction • Side Effects of VDD Reduction in TCAM Cells • Speed: No change • Dynamic power: No change • Robustness • VDD Volt. Margin (Current-race sensing)
CAM: Design Techniques • Static Power Reduction • Voltage Margin of 144-bit TCAM word in 0.18 µm CMOS* * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques • Static Power Reduction • Effects of Technology Scaling* • Berkeley predictive technology model (BPTM) * N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004