510 likes | 651 Views
Multiscale Waveform Inversion and High-Performance Computing using Graphics Processing Units (GPU). Chaiwoot Boonyasiriwat Feb. 6, 2009. Part I Multiscale Waveform Inversion: A Blind Test on A Synthetic Dataset. Outline. Previous Results on Marine and Land Data Goals
E N D
Multiscale Waveform Inversion andHigh-Performance Computing using Graphics Processing Units (GPU) Chaiwoot Boonyasiriwat Feb. 6, 2009
Part IMultiscale Waveform Inversion:A Blind Test on A Synthetic Dataset
Outline • Previous Results on Marine and Land Data • Goals • Methods and Data Processing • Numerical Results • Summary 1
Gulf of Mexico Data 480 Hydrophones 515 Shots dt = 2 ms Tmax = 10 s 12.5 m 2
Comparing CIGs CIG from Waveform Tomogram CIG from Traveltime Tomogram 4
Comparing CIGs CIG from Waveform Tomogram CIG from Traveltime Tomogram 4
Comparing CIGs CIG from Waveform Tomogram CIG from Traveltime Tomogram 4
1.6 km 100 m Y-Coord. (km) 0 km 0 0 50 X-Coord. (km) Time (s) 2 Offset (km) -3.6 3.6 Saudi Arabia Land Survey 1. 1279 CSGs, 240 traces/gather 2. 30 m station interval, max. offset = 3.6km 3. Line Length = 46 km 4. Pick 246,000 traveltimes 5. Traveltime tomography -> V(x,y,z) 5
Brute Stack Section 0 Time (s) 2.0 3920 CDP 5070 6
Traveltime Tomostatics + Stacking 0 Time (s) 2.0 3920 CDP 5070 7
Waveform Tomostatics + Stacking 0 Time (s) 2.0 3920 CDP 5070 8
Outline • Previous Results on Marine and Land Data • Goals • Methods and Data Processing • Numerical Results • Summary 9
Goals • Blind Test • Sensitivity Test • unknown source wavelet • unknown forward modeling 10
Outline • Previous Results on Marine and Land Data • Goals • Methods and Data Processing • Numerical Results • Summary 11
Methods and Data Processing • Low-pass filtering 2 Hz 5 Hz • Source estimation • Waveform inversion • Traveltime tomography Time Picking: Shengdong 12
Outline • Previous Results on Marine and Land Data • Goals • Methods and Data Processing • Numerical Results • Summary 13
Original CSG 0 Time (s) 5 0 Offset (km) 5 14
Numerical Results Kirchhoff Migration Image overlaid with Traveltime Tomogram 0 Depth (km) 1 10 Location (km) 0 15
Numerical Results Kirchhoff Migration Image overlaid with Waveform Tomogram 0 Depth (km) 1 10 Location (km) 0 16
Results Common Image Gathers obtained using Waveform Tomogram 0 Depth (km) 1 Offset (km) 10 Location (km) 0 0 0.5 17
Waveform Tomogram vs. True Velocity Waveform Tomogram True Velocity 0 Depth (km) 1 0 Location (km) 10 0 Location (km) 10 18
Investigation I m/s True Model 0 3000 Depth (km) 1000 0.5 Waveform Tomogram using My Data 3000 0 Depth (km) 0.5 1000 0 10 19 Location (km)
Investigation II True Velocity Migration Image using Original Data 0 Depth (km) 1 0 Location (km) 10 0 Location (km) 10 20
Investigation III True Velocity Migration Image using My Data 0 Depth (km) 1 0 Location (km) 10 0 Location (km) 10 21
Outline • Previous Results on Marine and Land Data • Goals • Methods and Data Processing • Numerical Results • Summary 22
Summary • Blind test on a synthetic dataset. • Waveform inversion failed. • Need to investigate why waveform inversion failed. • Factors: source wavelet, forward modeling, velocity structure, incorrect information. 23
Future Work • Redo the inversion with correct information. • Speed up waveform inversion. 24
Outline • Motivation • Introduction to Computing on GPUs • Preliminary Results • Summary 1
Motivation: Peak Performance 1000 750 Peak GFLOP/s 500 250 0 2 Courtesy of NVIDIA
Motivation: Memory Bandwidth 120 100 80 Bandwidth GB/s 60 40 20 0 3 Courtesy of NVIDIA
Outline • Motivation • Introduction to Computing on GPUs • Preliminary Results • Summary 4
CPU vs. GPU GPU CPU GPU devotes more transistors to data processing. 5 Courtesy of NVIDIA
Large memories are slow, fast memories are small Thread synchronization does not work across different thread blocks. CPU vs. GPU (Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Host + GPU Storage Hierarchy Conventional Storage Hierarchy Proc Registers Registers Registers Registers Cache L2 Cache Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory L3 Cache Host Global Memory Constant Memory Memory Texture Memory 6 Source: Mary Hall (U of Utah), NVIDIA
GPUs were originally designed for graphics. High Speed: Useful of a variety of applications. Potential for very high performance at low cost Architecture well suited for certain kinds of parallel applications (data parallel) Demonstrations of 20-100X speedup over CPU General-Purpose Computation on GPUs (GPGPU) 7 Source: Mary Hall (U of Utah), GPGPU.org
Minimal extensions to C++. Allow kernel functions to be executed N times in parallel by N different CUDA threads. Each thread performs roughly the same computation to different partitions of data. Data-parallel interface to GPUs. Programming Model: CUDA(Compute Unified Device Architecture) 8 Source: Mary Hall, CUDA Programming Guide CS6963
Outline • Motivation • Introduction to Computing on GPU • Preliminary Results • Summary 9
Preliminary Results • Modeling Test: speedup factor of 20x using 1536 threads. • Migration Test: N/A (Thread synchronization problem) • Inversion Test: N/A 10
Forward Modeling Test 0 m/s 4000 Depth (km) 3.5 1500 0 Horizontal Location (km) 15 NX = 1536 = Nthreads NZ = 373 11
Forward Modeling Test CSG from CPU CSG from GPU 0 0 Time (s) Time (s) 6 6 0 Offset (km) 15 0 Offset (km) 15 12
Conventional C Code for (iz=2; iz<nz-2; iz++) { for (ix=2; ix<nx-2; ix++) { indx = ix+iz*nx; P2[indx] = (2.0+2.0*C1*alpha)*P1[indx] - P0[indx] + alpha*(C2*(P1[indx-1] +P1[indx+1]+P1[indx-nx]+P1[indx+nx]) +C3*(P1[indx-2]+P1[indx+2] +P1[indx-2*nx]+P1[indx+2*nx])); } } 13
CUDA Code ix = threadIdx.x; for (iz=2; iz<nz-2; iz++) { indx = ix+iz*nx; P2[indx] = (2.0+2.0*C1*alpha)*P1[indx] - P0[indx] + alpha*(C2*(P1[indx-1] +P1[indx+1]+P1[indx-nx]+P1[indx+nx]) +C3*(P1[indx-2]+P1[indx+2] +P1[indx-2*nx]+P1[indx+2*nx])); } 14
Outline • Motivation • Introduction to Computing on GPU • Preliminary Results • Summary 15
Summary • GPU is a cheap, high-performance processor. • CUDA makes it possible to learn how to program on GPU with a steep learning curve. • Current timing result is very promising. • Better understanding of GPU/CUDA will improve the performance in the future. 16
Future • Develop CUDA-based codes for • FD forward modeling • RTM • Waveform inversion • Release codes some time in the Fall of 2009 17