340 likes | 554 Views
Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI. Chaiwoot Boonyasiriwat, Ge Zhan, Madhu Srinivasan , Markus Hadwiger , and Gerard Schuster Jan. 7, 2010. Outline. Introduction to Graphics Processing Unit (GPU) Multisource RTM and FWI on GPU Numerical Results
E N D
Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI Chaiwoot Boonyasiriwat, Ge Zhan, MadhuSrinivasan, Markus Hadwiger, and Gerard Schuster Jan. 7, 2010
Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 1
Introduction to GPU Real-time volume rendering using GPU by Markus Hadwiger 2
Performance of GPU vs CPU 1000 750 Peak GFLOP/s 500 250 0 3 Courtesy of NVIDIA
Memory Bandwidth of GPU vs CPU 120 100 80 Bandwidth GB/s 60 40 20 0 4 Courtesy of NVIDIA
Seismic Applications for GPUs • Well Logging (Mendoza et al., 2009) • Migration (Foltinek et al., 2007; Li et al., 2009; Wang et al., 2009) • Visualization and Interpretation (Lin and Wei, 2007; Kadlec et al., 2009) 5
GPU Architecture: High-Level View Multiprocessors: each contains 8 processors High performance when thousands of threads execute concurrently Image from Micikevicius, NVIDIA 6
CUDA Programming Model (Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Threadblocks Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory Host Global Memory Constant Memory Texture Memory 7
CUDA Programming Model Grid of Threadblocks 8
Heterogeneous Programming Serial code Parallel kernel Kernel<<<grid,block>>> 9
Parallel Kernel block(BLOCK_X,BLOCK_Y) grid(nx/BLOCK_X, nz/BLOCK_Y) Kernel<<<grid,block>>> nx Block(0,0) Block(1,0) Block(2,0) nz Block(0,1) Block(1,1) Block(2,1) 10
Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 11
Multisource RTM/FWI Model Encoded Data Evaluate misfit function and compute gradient Perturb Model Evaluate misfit function No Search criterion Yes Yes No Convergence criterion Done 12
Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg init_pressure<<< grid,block>>> For it = 1, nt modeling <<< grid,block>>> save_boundary <<< grid,block>>> End End … End Parallel Parallel Parallel Parallel Encoded Modeling 13
Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg Encoded modeling Encode observed data End Compute the gradient Line Search End Parallel Reduce I/O Parallel Serial Parallel Parallel Encoded data reused 14
Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 15
Numerical Results: RTM 2D SEG/EAGE Salt Model 16
Numerical Results: RTM Conventional RTM Image using 200 CSGs 17
Numerical Results: RTM Multisource RTM Image using 20 SSGs 10x speedup 18
Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 19
Numerical Results: FWI Marmousi II Model 20
Numerical Results: FWI Conventional RTM Image using 272 CSGs 21
Numerical Results: FWI Multisource RTM Image using 17 SSGs 16x speedup 22
Numerical Results: FWI Multisource FWI Velocity Tomogram using 17 SSGs 23
Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 24
Summary • Multisource RTM/FWI are implemented on a GPU. • I/O from the host machine to GPU are reduced by phase encoding. • Theoretical speedup is achieved for multisource RTM. • CUDA code using 1 GPU is about 10x faster than MPI code using 8 processors for a 2D RTM experiment. • GPU is a cheap, high-performance computing machine for seismic migration and inversion. 25
Outline • Introduction to Multisource Technology • Multisource Full-Waveform Inversion • Numerical Results • 3D SEG/EAGE Overthrust Model • Summary • Future Work • Acknowledgment 26
Future Work • Implement 3D multisource RTM/FWI on a GPU. • Develop CUDA codes for a GPU cluster. • Develop real-time 2D multisource RTM/FWI with user interfaces (computational steering) • Joint proposals with PSU and U of U 27
GPU Crews 28
Acknowledgment • Sponsors of 2009 UTAM consortium • Workstation: Benoit Marchand • Thank you for your attention 29