280 likes | 361 Views
Tesla: Fastest Processor Adoption in HPC History. http://www.nvidia.com/tesla. GPU Computing. 240 cores. 4 cores. CPU + GPU Co-Processing Heterogeneous Computing. Computation Discontinuity. Double Precision debut. 50x – 150x. 146X. 36X. 18X. 50X. 100X. Medical Imaging U of Utah.
E N D
Tesla: Fastest Processor Adoption in HPC History http://www.nvidia.com/tesla
GPU Computing 240 cores 4 cores CPU + GPU Co-Processing Heterogeneous Computing
Computation Discontinuity Double Precision debut
50x – 150x 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois, Urbana Video Transcoding Elemental Tech Matlab Computing AccelerEyes Astrophysics RIKEN 149X 47X 20X 130X 30X Financial simulation Oxford Linear Algebra Universidad Jaime 3D Ultrasound Techniscan Quantum Chemistry U of Illinois, Urbana Gene Sequencing U of Maryland
Processors NVIDIA Tesla 10-Series GPU Massively parallel, many core architecture 240 Processor Cores 1 Teraflops – 1,000 times Cray X-MP IEEE Compliant Double Precision Floating Point Designed for Scientific Computing L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 Processor Communication Fabric Memory & I/O Fixed Function Acceleration L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 Processors
Tesla GPU Computing Products Tesla C1060 Computing Board Tesla S1070 1U System
New Class of Hybrid CPU-GPU Servers 2 Tesla M1060 GPUs Upto 18 Tesla M1060 GPUs SuperMicro 1U GPU Server Bull Bullx Blade Enclosure
Performance Tesla Co-processing Cluster 10,000x TeslaPersonal Supercomputer 100x TraditionalCPU Cluster CPU Workstation 1x K$ M$
UPenn: Finding a Better Shampoo 1 1 • Equal Performance • No Data Center Required Tesla PSC 32 CPU Servers 13x Lower Cost ~$7 K $128 K 9.6x Lower Power 1 kWatt 19.2 kWatts
Finance: Equity Pricing 1 1 • Equal Performance • 16x Less Space 2 Tesla S1070s 500 CPU Servers 10x Lower Cost $24 K $250 K 13x Lower Power 2.8 kWatts 37.5 kWatts
Oil & Gas: Seismic Processing • 31x Less Space 32 Tesla S1070s 2000 CPU Servers 1 1 • Equal Performance 27x Lower Power 45 kWatts 1200 kWatts 20x Lower Cost ~$400 K ~$8 M
Workstation Supercomputing Tesla Personal Supercomputer ~5000 Customers
Tesla Cluster Installations 2008 2009
Supercomputing for the Masses 100s of researchers $10M+ Large Clusters 100,000s of researchers Tesla Preconfigured Clusters $50K-$1M Tesla Personal Supercomputer Millions of researchers < $5K
CUDA Parallel Computing Architecture GPU Computing Applications C C++ Fortran OpenCLtm DirectX Compute Java Python NVIDIA GPU CUDA Parallel Computing Architecture OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.
CUDA: Widely Adopted Parallel Programming Model • 120 Million CUDA GPUs • 60,000+ Active Developers • 1000+ Research Papers • 200+ universities teaching CUDA
CUDA Ecosystem Over 200 Universities Teaching CUDA Compilers PGI FortranCAPs HMPPMCUDAMPINOAA Fortran2COpenMP Languages C, C++DirectXFortranJavaOpenCLPython IIT Delhi Tsinghua Dortmundt ETH Zurich Moscow NTU … UIUC MIT Harvard Berkeley Cambridge Oxford … Oil & Gas Finance Medical Biophysics Applications Libraries FFTBLASLAPACKImage processingVideo processingSignal processingVision OEMs Consultants Numerics DSP EDA ANEO Imaging CFD GPU Tech
More Informationhttp://www.nvida.com/teslaProductsVertical SolutionsCUDA GPU Programming Training • GPU Developer Conference Sept 30 – Oct 2, 2009 • San Jose, CA • http://www.nvidia.com/gtc
Compiling C for CUDA Applications void serial_function(… ) { ... } void other_function(int ... ) { ... } void saxpy_serial(float ... ) { for(int i = 0; i<n; ++i) y[i] = a*x[i] + y[i]; } void main( ) { float x; saxpy_serial(..); ... } • C CUDA • Key Kernels • Rest of C • Application NVCC (Open64) • CPU Compiler Modify into Parallel CUDA code • CUDA object • files • CPU object • files Linker • CPU-GPU • Executable
C for CUDA : C with a few keywords Standard C Code Parallel C Code
CUDA Programming Effort / Performance Source : MIT CUDA Course
Computed Tomography (CT) Science Medical Source: Batenburg, Sijbers, et al Source: Ufimtsev, Martinez Manufacturing Finance Source: Tolke, Krafczyk Source: CUDA SDK, NAG
FFT Performance: CPU vs GPU cuFFT 2.3: NVIDIA Tesla C1060 GPU MKL 10.1r1: Quad-CoreIntel Core i7 (Nehalem) 3.2GHz
BLAS Performance: CPU vs GPU CUBLAS: CUDA 2.2, Tesla C1060 MKL 10.0.3: Intel Core2 Extreme, 3.00GHz
Heterogeneous Computing Domains Graphics HighlyParallel Computation GPU(Parallel Computing) CPU(Sequential Computing) Control and Communication Productivity Application Data Intensive Application Oil & Gas Finance Medical Biophysics Numerics Audio Video Imaging