1 / 29

Nature vs Nurture for Artificial Intelligence SoCs

Discover the impact of nature vs. nurture in Artificial Intelligence Systems on Sept. 12th, 2019 by Haopeng Liu. Explore what defines AI SoCs, market trends, key challenges, processing, memory, connectivity, security, and more.

rbonnie
Download Presentation

Nature vs Nurture for Artificial Intelligence SoCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nature vs Nurture for Artificial Intelligence SoCs Sept 12th, 2019 Haopeng Liu/刘好朋

  2. Agenda • What’s an AI SoC? • Market Trends • Key Challenges • Processing • Memory • Connectivity • Security • Summary

  3. What’s an AI SoC?

  4. Defining Artificial Intelligence • Artificial intelligence mimics human behavior • Machine learning uses advanced statistical models to find patterns & results • Deep learning is a specialized subset of machine learning using neural networks data to recognize patterns Artificial Intelligence Mimics human behavior Machine Learning Uses advanced statistical algorithms to improve AI Deep Learning (Neural Networks) Regression Bayesian Clustering Decision Trees Recurrent Neural Network Convolutional Neural Network Vector Machines Capsule Neural Network…. Spiking Neural Network Neural Networks

  5. What Makes it an AI SoC? Vast majority of Inference today Neural Network Hardware Acceleration Vast majority of investment for AI SoCs Power per performance leader Vast majority of Training today Enabled better-than-human-error capabilities • Most investment dedicated for CNN, RNN, some SNN • Solutions include Software Development Kit (SDK) for mapping AI Graphs to hardware • Most competitive include Neural Network Hardware Acceleration

  6. AI SoC Market Trends

  7. AI’s Insatiable Need for Compute, Memory, & Electricity Model Sizes in Million Weights COMPUTE • COMPUTE • ResNeXt-101 - >30B operations • Google’s Voice Recognition >19B operations • MEMORY • ResNet-152 - >60M parameters • Google’s Voice Recognition - >34M parameters • ELECTRICITY • “AI workloads could consume 80% of all compute cycles and 10% of global electricity use by 2030” MEMORY ELECTRICITY Model Innovation is Increasing # of Weights & Multiplications Needed

  8. Data Center AI is Moving to the Edge AI Capabilities Will Drive Edge Computing Source: companies Data Center AI 2018 Revenue • NVIDA, Intel and Google dominate AI Data Center market share • Edge computing expected CAGR 20% to 50% by 2022 • End node inference optimizations with compression can be time intensive and costly NVIDIA: $3 billion Intel: $1 billion Market Opportunity is at the Edge

  9. Key AI SoC Challenges

  10. Deep Learning SoC Challenges Unique Requirements for Processing, Memory, Connectivity, Security Specialized Processing Memory Performance Real-Time Connectivity

  11. Leading AI Processor Options MetaWare EV Software • ASIP Designer • Design Application Specific Instruction-set Processors • Unlimited design flexibility • Example below shows LSTM (Long Short-Term Memory) a form of RNN built via ASIP Designer. • EV6x Embedded Vision Processor IP • Vision cores + CNN • Standard software toolchains Libraries (OpenCV) & API (OpenVX) Compilers / Debuggers (C/C++, OpenCL C) Simulators (fast NSIM, EV VDK) CNN Mapping Tool Core 1 512-bit vector DSP 32-bit scalar 880 MAC Engine EV6x Embedded Vision Processor VFPU Convolution Conv. 2D CNN Engine (scalable) Vision CPU (1/2/4 cores) Core 4 3520 MAC Engine Core 3 Core 2 Classification Conv. 1D 1760 MAC Engine SFPU Sync & Debug Streaming Transfer Unit Shared Memory AXI Interconnect Best Processing Solutions & Expertise Synopsys Confidential Information

  12. ARC HS47 DSP for Control and Scalar DSP Low-cost Controller for AI Engines • Dual Issue / 64-bit LD/ST • ARCv2DSP ISA – code compatible w/EMxD, higher Fmax • Designed-in multicore support – scalable from 1 to 4 cores • 1 x 32x32 MAC/cycle • 2 x 16x16 MAC/cycle • Complete C/C++ based tool suite Scalar DSP MetaWare Development Tools Libraries Simulators (fast NSIM, VDK) VISION CPU VISION CPU Compilers / Debuggers (C/C++) VISION CPU I$ I$ I$ D$ D$ D$ 32-bit Scalar 32-bit Scalar 32-bit Scalar Scalar FPU Scalar FPU Scalar FPU 512-bit Vector DSP 512-bit Vector DSP 512-bit Vector DSP HS47 DSP Customer Specific AI Engine SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) SIMD (512b) I$ D$ 32-bit Scalar Scalar FPU X-bar X-bar X-bar Vector Memory (CCM) Vector Memory (CCM) Vector Memory (CCM) Apex User-defined Extensions Network-on-Chip

  13. Specialized Processing Challenges in AI CNN NATURE: Processor Hardware Challenges Identify Application Targets Support New and Emerging AI Algorithms Support Training, Inference, Compression Heterogenous Compute NURTURING: System Design Optimization Challenges Framework support Mapping Tool Optimizations Benchmarking Simulation / Prototyping VGG RNN Vision Scalar AlexNet Voice Vector SNN ResNet Pattern Recognition Matrix Multiplication Coherency TensorFlow Architectural Exploration 32-bit FP Caffe2 Software Development ONNX Benchmarking 8-bit Int 16-bit Int Power Analysis Traditional Processing Architectures are Insufficient, Synopsys Supports Your Processing Innovation

  14. Deep Learning SoC Challenges Unique Requirements for Processing, Memory, Connectivity, Security Specialized Processing Memory Performance Real-Time Connectivity

  15. Memory Options for AI

  16. Why is HBM2/2E on many AI Accelerators Providing the Best in Class pJ/bit & highest possible bandwidth • HBM2 • Up to 8Gb die • Up to 2400Mbps • HBM2 Provides more bandwidth and better power efficiency than DDR4 & GDDR5/5X • 8x 64b DDR4 ch @ 3.2Gb/s: 204.8 GB/s • 1x 1024b HBM2 stack @ 2Gb/s: 256GB/s • HBM technologies provide best roadmap for expanding bandwidth / better pJ/bit access with HBM2E & HBM3 • Chiplet technologies are trending • Synopsys provides proven solutions with optimized PPA, DFT, and interposer design support • HBM2E • Up to 16Gb die • Up to 3200Mbps

  17. Memory Bandwidth at the Edge Exercise in Processor Configuration & Mapping Tools • Mobile & Auto use LPDDR • Assumed Compression • Available Techniques • Quantization: Converting from 32b FP to 8b/12b INT or FP • Pruning: Removing zero/near zero Coefficients • Compression: Reducing size of feature maps by removing statistical redundancy Synopsys Confidential Information

  18. Machine Learning Foundation IP Foundation IP for 7-nm Processes • Foundation IP customized for Machine Learning • Special cells for low power dot product implementation • Near threshold voltage cells • Multi-ported Memory (up to 10 ports) • Design specific analysis for large SRAM content • HPC Kit customized for ARC EV • Integrated test mode in Synopsys Memory and SMS reduces area by 7% and dynamic power by 10% • Total system regression testing using Synopsys tools and test blocks; proven with other industry tools • One stop shop – Silicon validated solution consisting of memories, libraries and test delivering best PPA HS SP SRAM HS Logic Library High Speed HS 1P RF Cache HPC Design Kit (Cells + Memory) HD Logic Library HD SP SRAM HD 2P RF (2 clocks) High Density HD 1P RF HD DP SRAM (2 clocks) HPC Design Kit (Cells + Memory) ViaROM UH Density UHD SP SRAM UHD 2P RF (1 clock) UHD 1P RF UHD 2P SRAM (1 clock) Embedded Memory Test & Repair Customizations for Power & Density are Critical to optimize Memories for Deep Learning

  19. HPC Kit Enhanced for AI Applications Special Cells introduced to reduce CNN engine’s power consumption up to 39% Tradeoff tuning enables 7% frequency boost with 28% lower power Synopsys Confidential Information

  20. Memory Performance Challenges in AI NATURE: IP Selection Addresses Memory Challenges (i.e. Synopsys EMLT) Capacity (DDR5) Bandwidth (HBM2e) Power Consumption (LPDDR4x/5) NURTURING: But System Design Optimizations are required Memory & Processing Co-Design Large Array Yield Challenges (EMLT) SRAM Customizations (EMLT) Additional Test Vectors (SMS/SHS) HBM2 Packaging Expertise & Support Processing Time Memory Used Recalculate Activations Activations Stored in Memory # of Blocks # of Blocks

  21. Deep Learning SoC Challenges Unique Requirements for Processing, Memory, Connectivity, Security Specialized Processing Memory Performance Real-Time Connectivity

  22. DesignWare Die2Die SR-112 Support for Ultra & Extra Short Reach Standards • Die To Die Interconnect: • 56G NRZ • 112G PAM4 • 112G PAM4 w/FEC • Key Features / Tradeoffs: • Bandwidth • Power • Latency • Reach • Quality 56G NRZ No FEC 112G PAM4 No FEC 112G PAM4 With FEC

  23. Test Support for AI • AI SoC impact as market moves to 7nm • Soft defects increase • Increases # of test modes needed • Expertise of IP, Test, and Subsystems is more important • DesignWare STAR Hierarchical System • Test Integration, Pattern Porting & Test Scheduling of IP on SoC • DesignWare STAR Memory System • Test, Repair and Diagnostics FinFET and Planar Memories • DesignWare STAR Memory System supports eFlash and now eMRAM at 40nm and beyond

  24. Real-Time Connectivity Challenges in AI NATURE: Great IP for AI Connectivity Challenges Rapid 7nm Development Cache Coherency High Speed Chip to Chip Latency NURTURING: System Design Optimizations Area optimizations (i.e. DDR Hardening) Time to Market (Subsystems) Industry Expertise (Standards Expertise) Early Software Development (Simulation/Prototyping) MIPI to CMOS Image Sensor PCIe, CXL or CCIX High Speed SerDes Image Sensor AI SoC Host SoC AI SoC AI SoC AI SoC Image Sensor

  25. Securing Deep Learning SoCs DesignWare Security IP Solutions: Certified and Standards Compliant Secure authentication, data encryption, key management, platform security & content protection • AI Models • Expensive • Updates required • AI applications use private data • Facial Recognition • Biometrics • Integrity of the model: • Model corruption by nefarious agents • Corrupted models behave poorly • Trusted Execution Environment (TEE) with DesignWare IP secures neural network SoCs for AI applications

  26. Summary

  27. Nurturing Amazing AI SoCs Nurture • Expertise reduces risk, improves PPA, and improves time to market • Industry Leading Tools enable more competitive designs • Customizations optimize system performance Expertise AI Frameworks, Graphs & Mapping Tools Large SRAM Array Analysis IP Subsystems Near Threshold Libraries Tools ASIP Designer HPC Kit for EV Processor nSIM MetaWare Platform Architect SMS/SHS HAPS & Zebu Customization Architectural Analysis Bench-marking Security Consulting Customized Processors Custom Memory Cells IP Hardening

More Related