Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI Chaiwoot Boonyasiriwat, Ge Zhan, MadhuSrinivasan, Markus Hadwiger, and Gerard Schuster Jan. 7, 2010

Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 1

Introduction to GPU Real-time volume rendering using GPU by Markus Hadwiger 2

Performance of GPU vs CPU 1000 750 Peak GFLOP/s 500 250 0 3 Courtesy of NVIDIA

Memory Bandwidth of GPU vs CPU 120 100 80 Bandwidth GB/s 60 40 20 0 4 Courtesy of NVIDIA

Seismic Applications for GPUs • Well Logging (Mendoza et al., 2009) • Migration (Foltinek et al., 2007; Li et al., 2009; Wang et al., 2009) • Visualization and Interpretation (Lin and Wei, 2007; Kadlec et al., 2009) 5

GPU Architecture: High-Level View Multiprocessors: each contains 8 processors High performance when thousands of threads execute concurrently Image from Micikevicius, NVIDIA 6

CUDA Programming Model (Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Threadblocks Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory Host Global Memory Constant Memory Texture Memory 7

CUDA Programming Model Grid of Threadblocks 8

Heterogeneous Programming Serial code Parallel kernel Kernel<<<grid,block>>> 9

Parallel Kernel block(BLOCK_X,BLOCK_Y) grid(nx/BLOCK_X, nz/BLOCK_Y) Kernel<<<grid,block>>> nx Block(0,0) Block(1,0) Block(2,0) nz Block(0,1) Block(1,1) Block(2,1) 10

Multisource RTM/FWI Model Encoded Data Evaluate misfit function and compute gradient Perturb Model Evaluate misfit function No Search criterion Yes Yes No Convergence criterion Done 12

Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg init_pressure<<< grid,block>>> For it = 1, nt modeling <<< grid,block>>> save_boundary <<< grid,block>>> End End … End Parallel Parallel Parallel Parallel Encoded Modeling 13

Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg Encoded modeling Encode observed data End Compute the gradient Line Search End Parallel Reduce I/O Parallel Serial Parallel Parallel Encoded data reused 14

Numerical Results: RTM 2D SEG/EAGE Salt Model 16

Numerical Results: RTM Conventional RTM Image using 200 CSGs 17

Numerical Results: RTM Multisource RTM Image using 20 SSGs 10x speedup 18

Numerical Results: FWI Marmousi II Model 20

Numerical Results: FWI Conventional RTM Image using 272 CSGs 21

Numerical Results: FWI Multisource RTM Image using 17 SSGs 16x speedup 22

Numerical Results: FWI Multisource FWI Velocity Tomogram using 17 SSGs 23

Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 24

Summary • Multisource RTM/FWI are implemented on a GPU. • I/O from the host machine to GPU are reduced by phase encoding. • Theoretical speedup is achieved for multisource RTM. • CUDA code using 1 GPU is about 10x faster than MPI code using 8 processors for a 2D RTM experiment. • GPU is a cheap, high-performance computing machine for seismic migration and inversion. 25

Outline • Introduction to Multisource Technology • Multisource Full-Waveform Inversion • Numerical Results • 3D SEG/EAGE Overthrust Model • Summary • Future Work • Acknowledgment 26

Future Work • Implement 3D multisource RTM/FWI on a GPU. • Develop CUDA codes for a GPU cluster. • Develop real-time 2D multisource RTM/FWI with user interfaces (computational steering) • Joint proposals with PSU and U of U 27

GPU Crews 28

Acknowledgment • Sponsors of 2009 UTAM consortium • Workstation: Benoit Marchand • Thank you for your attention 29

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI

Presentation Transcript

The Performance Bottleneck Application, Computer, or Network

ELECTRONIC MEDIA RELATIONS

Relieving the Orthopaedic Outpatients Bottleneck

Breaking Bad News: The Importance of Collaborative Working

FAST Exam

Breaking Ranks: The Comprehensive Framework for School Improvement

Fast Trie Data Structures

Breaking Ranks II : Strategies for Leading High School Reform

Breaking Protection

EXCELLENCE. ALWAYS. American Gem Society/Breaking Barriers Tom Peters/28April2006/Orlando

Take your midterm today!

Math at Top Speed: Exploring and Breaking Myths in the Drag Racing Folklore

Fast Propositional Algorithms for Planning

DTIS

Sparse LA

EXCELLENCE. ALWAYS. American Gem Society/Breaking Barriers Tom Peters/28April2006/Orlando

Make Plone Fast!

Stick-Breaking Constructions

Fast-Food Restaurant Industry Analysis

EMC Symmetrix VMax – VP, FAST VP for Oracle DBs

Fast food on strike