Seongguk Kim Advisor: Prof. Joungho Kim TeraByte Interconnection and Package Laboratory

Signal Integrity Design of Energy-Efficient Processing-In-Memory in High Bandwidth Memory (PIM-HBM) Architecture for AI Applications Seongguk Kim Advisor: Prof. Joungho Kim TeraByte Interconnection and Package Laboratory Department of Electrical Engineering KAIST Mar 4th , 2019

Proposed Processing-In-Memory in High Bandwidth Memory (PIM-HBM) Architecture for AI Applications • PIM-HBM DRAM • Host Processor • PIM-HBM DRAM DRAM High-speed TSV channel DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM Compute-intensive applications Memory-intensive applications PHY PHY Host Core PIM PIM PIM PIM • Parallel PIM Architecture Silicon Interposer On-interposer channel On-chip channel < Conceptual view of the proposed PIM-HBM Architecture on silicon interposer > • Low energy consumption & latency by reduced interconnection length • Extremely high bandwidth by high-speed TSV channel • High performance offloading PIM to memory-intensive applications  System Efficiency(performance/Watt)

Research Motivation & Objective • Motivations • Processing-In-Memory (PIM) → Essential for energy-efficient and high performance system • Interconnection→ Importance factor limiting system performance scaling • High-speed TSV → A method for extending bandwidth of PIM to offload memory-intensive applications • Objectives • Propose PIM-HBM architecture for high bandwidth system such as data centers and AI servers • Design and Analyze on-chip & on-interposer channel to verify energy consumption and interconnection delay of PIM-HBM architecture • Design and Analyze high speed TSV channel topology & TSV array designs to provide high bandwidth to PIM cores for high performance

7.75 mm Floorplan of the Proposed PIM-HBM Architecture for Reducing Data Movement Cost between DRAM and Processor PIM core Host core On-chip channel On-interposer channel • Host Processor PIM channel 6.0 mm Memory Controller • 11.87 mm Host channel 28.2 mm • PIM-HBM • PIM-HBM L2 Cache / Interconnection network Memory Controller Memory Controller • PIM-HBM • PIM-HBM • 26.5 mm • Embedded PIM cores  Interconnection length Energy consumption & latency • 44.0 mm • 34.5 mm

Proposed TSV Array Design with High-Speed Signaling to Provide Extremely High Bandwidth to PIM Cores RDL DRAM8 DRAM7 DRAM6 DRAM5 DRAM4 DRAM3 DRAM2 Stable Impedance Crosstalk Reduction DRAM1 3 TSV Driver Load Base *RDL = Redistribution layer < Spiral P2P TSV channel topology > < Ground distributed TSV array > • For high-speed signaling, signal integrity design of TSV channel is essential. • Spiral P2P TSV reduces energy consumption by decreased capacitive load. • Ground distributed TSV array provides a stable return current path and reduces TSV to TSV crosstalk.

Design of On-chip & On-interposer channels to verify DRAM Access cost of PIM-HBM Architecture Tx Repeater Tx ESD ESD Rx 1 µm 2 µm Pad Pad On-chip global channel Interposer channel 1 µm < schemeof on-chip channel and on-interposer channel with I/O driver > Passivation * Channel length : 1.5 mm * Data rate : 3 Gbps * Channel length : 4.8 mm * Data rate : 2 Gbps 1 µm 3 µm 0.5 µm S S S S S 2 µm σCu＝5.8 107 S/m Si3N4＝6.5 0.5 µm Ground SiO2＝4.1 0.25 µm SiO2 1 µm SiO2＝4.1 0.5 µm S Power-grid S 1 µm σCu＝5.8 107 S/m Meshed Ground 0.25 µm SiO2 0.5 µm Ground 4.5 µm 0.5 µm < Designed stack-up of on-chip channel and on-interposer channel with material properties > • I/O driver and stack-up are designed with CMOS 0.18 µm process and fabrication process. • In case of on-chip channel, the length of a single stage is 1.5 mm and it extends to several stages by repeaters depending on the total interconnection length.

Comparison of DRAM Access Cost between PIM and Host in term of Interconnection Delay and Energy Consumption 11.71 77 % 79 % Driver Channel 2.47 Interconnection delay [ps] Energy per bit [pJ/bit] 77 % 79 %

Design of Multi-layer Spiral TSV for High-Speed Signaling • PIM-HBM DRAM 16 0 DRAM 15 DRAM 14 -5 DRAM 13 Transfer Function (dB) -10 DRAM 4 Load Effect DRAM 3 DRAM 2 -15 Multidrop w TSV @ 8-Hi DRAM 1 • P2P w Spiral TSV @ 8-Hi Multidrop w TSV @ 16-Hi PIM Base Die PIM -20 • P2P w Spiral TSV @ 16-Hi 0.1 1 10 20 Frequency (GHz) < Cross-section of Multi-layer TSV in PIM-HBM > < Transfer function of full path TSV channel > • As the number of layer is increased, additional RDL for spiral TSV increases insertion loss. • But, overall signal performance is increased because capacitive load of drivers is reduced. *RDL = Redistribution layer

Eye-Diagram Simulation Results of Conventional Multidrop TSV and Proposed Spiral TSV with ground distribution High driver load Low driver load 39.4 ps (31.5 % of UI) Stable Impedance Impedance Control Voltage (V) Voltage (V) Crosstalk Reduction High Crosstalk 0.387 V (38.7 % VDD) < proposed spiral TSV with ground distribution > < Conventional multidrop TSV > @8Gbps Time (ps) @8Gbps Time (ps) Eye close

Conclusion • Propose PIM-HBM architecture for high bandwidth systems by reducing DRAM access cost in terms of interconnection delay and energy efficiency. • Design and Analyze on-chip and on-interposer channel considering signal integrity to verify DRAM access cost of PIM-HBM architecture • Propose and Verifyhigh speed TSV channel topology & TSV array designs to provide high bandwidth to PIM cores for high performance • For high bandwidth system such as AI servers, the proposed PIM-HBM architecture is essential to reduce interconnection cost and to continuously scale performance.

Seongguk Kim Advisor: Prof. Joungho Kim TeraByte Interconnection and Package Laboratory

Seongguk Kim Advisor: Prof. Joungho Kim TeraByte Interconnection and Package Laboratory

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7