1 / 49

GTC 2017: Powering the AI Revolution

At the 2017 GPU Technology Conference in Silicon Valley, NVIDIA CEO Jensen Huang introduced a lineup of new Volta-based AI supercomputers including a powerful new version of our DGX-1 deep learning appliance; announced the Isaac robot-training simulator; unveiled the NVIDIA GPU Cloud platform, giving developers access to the latest, optimized deep learning frameworks; and unveiled a partnership with Toyota to help build a new generation of autonomous vehicles.

nvidia
Download Presentation

GTC 2017: Powering the AI Revolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO | GTC 2017

  2. LIFE AFTER MOORE’S LAW 40 Years of Microprocessor Trend Data 107 106 Transistors (thousands) 1.1X per year 105 104 103 1.5X per year 102 Single-threaded perf 1980 1990 2000 2010 2020 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 2

  3. RISE OF GPU COMPUTING 1000X by 2025 GPU-Computing perf 1.5X per year 107 APPLICATIONS 106 1.1X per year ALGORITHMS 105 104 SYSTEMS 103 CUDA 1.5X per year 102 Single-threaded perf ARCHITECTURE 1980 1990 2000 2010 2020 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 3

  4. RISE OF GPU COMPUTING 7,000 511,000 1M+ 2012 2017 2012 2017 GTC Attendees 3X in 5 Years GPU Developers 11X in 5 Years CUDA Downloads in 2016 4

  5. ANNOUNCING PROJECT HOLODECK Photorealistic models Interactive physics Collaboration Early access in September 5

  6. ERA OF MACHINE LEARNING “The Master Algorithm” — Pedro Domingos “A Quest for Intelligence” — Fei-Fei Li 6

  7. BIG BANG OF MODERN AI Transfer Learning Reinforcement Learning Arterys FDA Approved Super Resolution Google Photo Deep Voice AlphaGo Stanford & NVIDIA Large-scale DNN on GPU Auto Encoders ImageNet NVIDIA BB8 Captioning BRETT Style Transfer NMT U Toronto AlexNet on GPU IDSIA LSTM GAN CNN on GPU Baidu DuLight Superhuman ASR 7

  8. DEEP LEARNING FOR RAY TRACING 8

  9. IRAY WITH DEEP LEARNING 9

  10. BIG BANG OF MODERN AI $5B 13,000 20,000 $5B 2014 2016 2015 2017 2012 2016 NIPS, ICML, CVPR, ICLR Attendees 2X in 2 Years Udacity Students in AI Programs 100X in 2 Years AI Startups Investments 9X in 4 Years 10

  11. POWERING THE AI REVOLUTION FRAMEWORKS GPU SYSTEMS GPU AAS INTERNET SERVICES AUTOMOTIVE NVAIL HEALTHCARE HGX-1 DRIVE PX NVIDIA NVIDIA DEEP LEARNING SDK DEEP LEARNING SDK DGX-1 NVIDIA RESEARCH ENTERPRISE PLACES INCEPTION ROBOTS TESLA JETSON TX 11

  12. NVIDIA INCEPTION — 1,300 DEEP LEARNING STARTUPS PLATFORMs & APIs HEALTHCARE RETAIL, ETAIL FINANCIAL SECURITY, IVA DATA MANAGEMENT IOT & MANUFACTURING AUTONOMOUS MACHINES CYBER AEC DEVELOPMENT PLATFORMS BUSINESS INTELLIGENCE & VISUALIZATION 12

  13. SAP AI FOR THE ENTERPRISE First commercial AI offerings from SAP Brand Impact, Service Ticketing, Invoice-to-Record applications Powered by NVIDIA GPUs on DGX-1 and AWS 13

  14. MODEL COMPLEXITY IS EXPLODING 105 ExaFLOPS 8.7 Billion Parameters 20 ExaFLOPS 300 Million Parameters 7 ExaFLOPS 60 Million Parameters 2015 — Microsoft ResNet 2016 — Baidu Deep Speech 2 2017 — Google NMT 14

  15. ANNOUNCING TESLA V100 GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE 21B xtors | TSMC 12nm FFN | 815mm2 5,120 CUDA cores 7.5 FP64 TFLOPS | 15 FP32 TFLOPS NEW 120 Tensor TFLOPS 20MB SM RF | 16MB Cache 16GB HBM2 @ 900 GB/s 300 GB/s NVLink 15

  16. NEW TENSOR CORE New CUDA TensorOp instructions & data formats 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized for deep learning Weights Inputs Output Results Activation Inputs 16

  17. ANNOUNCING TESLA V100 GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE Compared to Pascal 1.5X General-purpose FLOPS for HPC 12X Tensor FLOPS for DL training 6X Tensor FLOPS for DL inferencing 17

  18. 18

  19. GALAXY 19

  20. DEEP LEARNING FOR STYLE TRANSFER 20

  21. ANNOUNCING NEW FRAMEWORK RELEASES FOR VOLTA CNN Training (ResNet-50) Multi-Node Training with NCCL 2.0 (ResNet-50) LSTM Training (Neural Machine Translation) 8x P100 K80 8x K80 8x V100 P100 8x P100 64x V100 V100 8x V100 0 10 20 30 40 50 0 10 20 30 40 50 0 5 10 15 20 25 Hours Hours Hours 21

  22. ANNOUNCING NVIDIA DGX-1 WITH TESLA V100 ESSENTIAL INSTRUMENT OF AI RESEARCH 960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube From 8 days on TITAN X to 8 hours 400 servers in a box 22

  23. ANNOUNCING NVIDIA DGX-1 WITH TESLA V100 ESSENTIAL INSTRUMENT OF AI RESEARCH 960 Tensor TFLOPS | 8x Tesla V100 | NVLink Hybrid Cube From 8 days on TITAN X to 8 hours 400 servers in a box $149,000 Order today: nvidia.com/DGX-1 23

  24. ANNOUNCING NVIDIA DGX STATION PERSONAL DGX 480 Tensor TFLOPS | 4x Tesla V100 16GB NVLink Fully Connected | 3x DisplayPort 1500W | Water Cooled 24

  25. ANNOUNCING NVIDIA DGX STATION PERSONAL DGX 480 Tensor TFLOPS | 4x Tesla V100 16GB NVLink Fully Connected | 3x DisplayPort 1500W | Water Cooled $69,000 Order today: nvidia.com/DGX-Station 25

  26. ANNOUNCING HGX-1 WITH TESLA V100 VERSATILE GPU CLOUD COMPUTING 8x Tesla V100 with NVLINK Hybrid Cube 2C:8G | 2C:4G | 1C:2G Configurable NVIDIA Deep Learning, GRID graphics, CUDA HPC stacks 26

  27. ANNOUNCING TENSORRT FOR TENSORFLOW Compiler for Deep Learning inferencing next layer Relu Relu Relu Relu bias bias bias bias 1x1 conv 3x3 conv 5x5 conv 1x1 conv Relu Relu bias bias max pool 1x1 conv 1x1 conv previous layer Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow 27

  28. ANNOUNCING TENSORRT FOR TENSORFLOW Compiler for Deep Learning Inferencing next layer next layer Relu Relu Relu Relu 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR bias bias bias bias 1x1 conv 3x3 conv 5x5 conv 1x1 conv Relu Relu 1x1 CBR 1x1 CBR max pool bias bias max pool 1x1 conv 1x1 conv previous layer previous layer Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow 28

  29. ANNOUNCING TENSORRT FOR TENSORFLOW Compiler for Deep Learning Inferencing concat next layer 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR 1x1 CBR max pool 1x1 CBR max pool previous layer previous layer Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow 29

  30. ANNOUNCING TENSORRT FOR TENSORFLOW Compiler for Deep Learning Inferencing next layer 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR max pool previous layer Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow 30

  31. ANNOUNCING TENSORRT FOR TENSORFLOW Compiler for Deep Learning Inferencing next layer 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR max pool previous layer Graph optimizations for vertical and horizontal layer fusion | GPU-specific optimizations Import models from Caffe and TensorFlow 31

  32. ANNOUNCING TESLA V100 FOR HYPERSCALE INFERENCE 15-25X INFERENCE SPEED-UP VS SKYLAKE 150W | FHHL PCIE 32

  33. THE CASE FOR GPU ACCELERATED DATACENTERS 300K inf/s Datacenter @300 inf/s > 1K CPUs 1K CPUs > 500 Nodes @$3K = $1.5M @500W = 250KW Tesla V100 Reduce 15X 500 Nodes CPU Servers 33 Nodes GPU Accelerated Server 33

  34. NVIDIA DEEP LEARNING STACK DEEP LEARNING FRAMEWORKS DEEP LEARNING LIBRARIES NVIDIA cuDNN, NCCL, cuBLAS, TensorRT CUDA DRIVER OPERATING SYSTEM GPU SYSTEM 34

  35. ANNOUNCING NVIDIA GPU CLOUD GPU-accelerated Cloud Platform Optimized for Deep Learning Registry of Containers, Datasets, and Pre-trained models NVIDIA GPU CLOUD CSPs Containerized in NVDocker | Optimization across the full stack Always up-to-date | Fully tested and maintained by NVIDIA | Beta in July 35

  36. POWER OF GPU COMPUTING AMBER Performance (ns/day) GoogleNet Performance (i/s) cuDNN 7 CUDA 9 NCCL 2 12000 40 AMBER 16 CUDA 8 9600 32 7200 24 AMBER 14 CUDA 6 cuDNN 6 CUDA 8 NCCL 1.6 AMBER 14 CUDA 5 4800 16 AMBER 12 CUDA 4 cuDNN 4 CUDA 7 2400 8 cuDNN 2 CUDA 6 0 0 K20 K40 K80 P100 8x K80 8x Maxwell DGX-1V DGX-1 2013 2014 2015 2016 2014 2015 2017 2016 36

  37. AI REVOLUTIONIZING TRANSPORTATION 280B Miles per Year 800M Parking Spots for 250M Cars in U.S. Domino’s: 1M Pizzas Delivered per Day 37

  38. NVIDIA DRIVE — AI CAR PLATFORM 100 TOPS Localization Path Planning DRIVE PX Xavier Level 4/5 Perception AI 10 TOPS DRIVE PX 2 Parker Level 2/3 Computer Vision Libraries CUDA, cuDNN, TensorRT OS 1 TOPS 38

  39. NVIDIA DRIVE Mapping-to-Driving Co-Pilot Guardian Angel 39

  40. ANNOUNCING TOYOTA SELECTS NVIDIA DRIVE PX FOR AUTONOMOUS VEHICLES 40

  41. AI PROCESSOR FOR AUTONOMOUS MACHINES General Purpose Architectures CPU FPGA Pascal Domain Specific Accelerators CUDA GPU Volta DLA Energy Efficiency XAVIER 30 TOPS DL 30W Custom ARM64 CPU 512 Core Volta GPU 10 TOPS DL Accelerator 41

  42. AI PROCESSOR FOR AUTONOMOUS MACHINES General Purpose Architectures CPU + Domain Specific Accelerators CUDA GPU Volta DLA Energy Efficiency XAVIER 30 TOPS DL 30W Custom ARM64 CPU 512 Core Volta GPU 10 TOPS DL Accelerator 42

  43. ANNOUNCING XAVIER DLA NOW OPEN SOURCE Command Interface Tensor Execution Micro-controller Sparse Weight Decompression Output Postprocess or (Activation Function, Pooling etc.) Unified 512KB Input Buffer MAC Array Input DMA (Activations and Weights) Output DMA 2048 Int8 or 1024 Int16 or 1024 FP16 Output Accumulators Activations and Weights Native Winograd Input Transform Memory Interface Early Access July | General Release September 43

  44. ROBOTS SEE, HEAR, TOUCH LEARN & PLAN ACT 44

  45. ROBOTS Credit: Yevgen Chebotar, Karol Hausman, Marvin Zhang, Sergey Levine 45

  46. ANNOUNCING ISAAC ROBOT SIMULATOR Robot & Environment Definition ISAAC OpenAI GYM Robot Simulator NVIDIA GPU Computer Virtual Jetson 46

  47. ANNOUNCING ISAAC ROBOT SIMULATOR SEE, HEAR, TOUCH Robot & Environment Definition ISAAC OpenAI GYM Robot Simulator LEARN & PLAN ACT NVIDIA GPU Computer Virtual Jetson 47

  48. NVIDIA POWERING THE AI REVOLUTION NVIDIA GPU CLOUD CSPs NVIDIA GPU in Every Cloud NVIDIA GPU Cloud Tensor Core Xavier DLA Open Source TensorRT Tesla V100 DGX-1 and DGX Station ISAAC 48

More Related