561 likes | 637 Views
NVIDIA CEO and co-founder Jen-Hsun Huang took the stage for the GPU Technology Conference in the San Jose Convention Center to present some major announcements on March 17, 2015. You'll find out how NVIDIA is innovating in the field of deep learning, what NVIDIA DRIVE PX can do for automakers, and where Pascal, the next-generation GPU architecture, fits in the new performance roadmap.
E N D
LEAPS IN VISUAL COMPUTING J EN-HSUN HUANG, CO-FOUNDER & CEO | GTC 2015
FOUR ANNOUNCEMENTS A Very Fast Box and Deep Learning Self-Driving Cars and Deep Learning A New GPU and Deep Learning Roadmap Reveal and Deep Learning
AMAZING YEAR IN VISUAL COMPUTING © 2015 Industrial Light & Magic. All Rights Reserved.
10X GROWTH IN GPU COMPUTING 2008 150,000 CUDA Downloads 27 CUDA Apps 60 Universities Teaching 4,000 Academic Papers 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 27 CUDA Apps 60 Universities Teaching 4,000 Academic Papers 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 319 CUDA Apps 27 CUDA Apps 60 Universities Teaching 4,000 Academic Papers 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 319 CUDA Apps 27 CUDA Apps 60 800 Universities Teaching Universities Teaching 4,000 Academic Papers 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 319 CUDA Apps 27 CUDA Apps 60 800 Universities Teaching Universities Teaching 60,000 Academic Papers 4,000 Academic Papers 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 319 CUDA Apps 27 CUDA Apps 60 800 Universities Teaching Universities Teaching 60,000 Academic Papers 4,000 Academic Papers 450,000 Tesla GPUs 6,000 Tesla GPUs 77 Supercomputing Teraflops
10X GROWTH IN GPU COMPUTING 2008 2015 3 Million CUDA Downloads 150,000 CUDA Downloads 319 CUDA Apps 27 CUDA Apps 60 800 Universities Teaching Universities Teaching 60,000 Academic Papers 4,000 Academic Papers 450,000 Tesla GPUs 6,000 Tesla GPUs 54,000 Supercomputing Teraflops 77 Supercomputing Teraflops
TITAN X THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory
TITAN X FOR DEEP LEARNING Training AlexNet 43 ~ … 7 6 Days 5 4 3 2 1 0 16-core Xeon CPU TITAN TITAN Black cuDNN TITAN X cuDNN
TITAN X THE WORLD’S FASTEST GPU 8 Billion Transistors 3,072 CUDA Cores 7 TFLOPS SP / 0.2 TFLOPS DP 12GB Memory $999
FOUR ANNOUNCEMENTS A Very Fast Box and Deep Learning Self-Driving Cars and Deep Learning A New GPU and Deep Learning Roadmap Reveal and Deep Learning
A SHORT HISTORY OF DEEP LEARNING Accuracy % DNN 84% CV 74% 72% 2010 2011 2012 2013 2014 Convolutional Neural Networks for Handwritten Digital Recognition LECUN, BOTTOU, BENGIO, HAFFNER, 1998 ImageNet Classification with NVIDIA GPUs KRIZHEVSKY, HINTON, ET AL., 2012 1995 2000 2005 2010 2015
“ Deep Image: Scaling up Image Recognition” —Baidu: 5.98% , J an. 13, 2015 IMAGENET CHALLENGE “ Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” Accuracy % DNN —Microsoft: 4.94% , Feb. 6, 2015 84% CV 74% “ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariant Shift” 72% 2010 2011 2012 2013 2014 —Google: 4.82% , Feb. 11, 2015
DEEP LEARNING VISUALIZED
GPU-ACCELERATED DEEP LEARNING START-UPS
DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH Predicting the Toxicity of New Drugs —J ohannes Kepler University Detecting Mitosis in Breast Cancer Cells —IDSIA Understanding Gene Mutation to Prevent Disease —University of Toronto
“ Automated Image Captioning with ConvNets and Recurrent Nets” — Andrej Karpathy, Fei-Fei Li
USER INTERFACE Monitor Progress Process Data Configure DNN Visualize Layers DIGITS DEEP GPU TRAINING SYSTEM FOR DATA SCIENTISTS Theano Torch Caffe cuDNN, cuBLAS Design DNNs CUDA Visualize activations Manage multiple trainings GPU HW GPU Multi-GPU GPU Cluster Cloud
DIGITS Process Data Configure DNN Monitor Progress Visualize Layers Test Image
DIGITS DEVBOX World’s fastest GPU Max GPU out of a plug Multi-GPU training & inference
DIGITS DEVBOX —EARLY RESULTS “ DIGITS makes it way easier to design the best network for the job” “ I’ve never seen AlexNet run this fast…TitanX is a monster, Crazy Fast” Multi-GPU scaling on Torch 4x AlexNet VGG 3x —Simon Osindero —Soumith Chintala 2x A.I. Architech Research Engineer 1x 0x 1 2 4
DIGITS DEVBOX Available May 2015 $15,000
FOUR ANNOUNCEMENTS A Very Fast Box and Deep Learning Self-Driving Cars and Deep Learning A New GPU and Deep Learning Roadmap Reveal and Deep Learning
72 Volta 60 48 Pascal Mixed Precision 3D Memory NVLink GPU ROADMAP Pascal 2x SGEMM/W SGEMM / W 36 24 Maxwell 12 Kepler Fermi Tesla 0 2008 2010 2012 2014 2016 2018
60 Volta 50 Frame Buffer Capacity (GB) 40 GPU ROADMAP Pascal 2.7x Memory Capacity Pascal Mixed Precision 3D Memory NVLink 30 20 Maxwell 10 Kepler Fermi Tesla 0 2008 2010 2012 2014 2016 2018
144 Volta 120 96 Pascal Mixed Precision 3D Memory NVLink GPU ROADMAP Pascal 4x Mixed Precision HGEMM / W 72 48 24 Maxwell Kepler Fermi Tesla 0 2008 2010 2012 2014 2016 2018
900 Volta Pascal Mixed Precision 3D Memory NVLink 750 600 GPU ROADMAP Pascal 3x Bandwidth STREAM GB/s 450 300 Maxwell Kepler 150 Fermi Tesla 0 2008 2010 2012 2014 2016 2018
PASCAL 10X MAXWELL forward backward CONVOLUTION (compute) FULLY CONNECTED (bandwidth) FULLY CONNECTED (bandwidth) CONVOLUTION (compute) WEIGHT UPDATE (interconnect) 5x 2x 4x (FP16) 6x 6x 4x 10x Mixed Precision 3D Memory 3D Memory Mixed Precision NVLINK * Very rough estimates
FOUR ANNOUNCEMENTS A Very Fast Box and Deep Learning Self-Driving Cars and Deep Learning A New GPU and Deep Learning Roadmap Reveal and Deep Learning
TODAY’S ADAS SENSE PLAN ACT WARN BRAKE FPGA CV ASIC CPU
NEXT-GENERATION ADAS SENSE PLAN ACT WARN BRAKE FPGA CV ASIC CPU STEER ACCELERATE
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
NVIDIA DRIVE PX SELF-DRIVING CAR COMPUTER IMAGENET CHALLENGE SENSE PLAN ACT Accuracy % WARN FPGA CV ASIC CPU DNN BRAKE 84% CV STEER 74% 72% ACCELERATE DNN 2010 2011 2012 2013 2014
PROJ ECT DAVE —DARPA AUTONOMOUS VEHICLE DNN-based self-driving robot IMAGENET CHALLENGE Training data by human driver Accuracy % No hand-coded CV algorithms DNN 84% PROJ ECT LEADS CV 74% Urs Muller: Chief Architect, Autonomous Driving, NVIDIA 72% 2010 2011 2012 2013 2014 Yann LeCun: Director, AI Research, Facebook
TRAINING DATA 225K Images
TEST DRIVE No Training
TEST DRIVE Partially Trained (52K images)
TEST DRIVE Fully Trained (225K images)
AlexNet on DRIVE PX DAVE Number of Connections 630 Million 3.1 Million Frames / Second 184 12 Connections / Second 116 Billion 38 Million 3,000x Faster