210 likes | 304 Views
Computing Spherical Harmonic Transforms on CUDA-Compatible GPUs. Wangqun Lin, Fengshun Lu College of Computer National University of Defense Technology CACHES 2011 Tucson, Arizona, June 4th, 2011. Outline. Motivation Spherical Harmonic Transforms (SHT) Methods Direct Method
E N D
Computing Spherical Harmonic Transforms on CUDA-Compatible GPUs Wangqun Lin, Fengshun Lu College of Computer National University of Defense Technology CACHES 2011 Tucson, Arizona, June 4th, 2011
Outline • Motivation • Spherical Harmonic Transforms (SHT) • Methods • Direct Method • Efficiency of Threads Utilization • Reshaped Method • Concurrent Kernel Execution • Experiments
Motivation • Computing the S.H.T with GPUs • S.H.T is widely used • But with complexity of O(N3) • GPUs are powerful • Performance Metric in the SM level • Only emphasizing on the OCCUPANCY • Finding another metric to measure how the launched threads are efficiently used
Spherical Harmonic Transforms(1/2) ξ: state variable ξnm: spectral coefficients of state variable ξ μ: Gaussian latitude λ: Longitude M: model truncation wavenumber N(m): highest degree of associated Legendre function for wavenumber m Pnm(μ)eimλ: associated Legendre functions
Spherical Harmonic Transforms(2/2) Forward Fourier Forward Legendre Inverse Legendre Inverse Fourier
Methods – Direct (1/9) • Forward Legendre • m ≤ n CUDA Thread Thread Block
Methods – Direct (2/9) • Inverse Legendre • m ≤ n CUDA Threads of block j
Methods – ETU Metric (3/9) • Efficiency of Thread Utilization(ETU) • Measures the proportion of launched threads doing useful work during the entire execution interval • Mainly used as a algorithm design guideline • Assumption • Algorithms consist of many micro steps • tu(t,s) function • t: thread • s: micro step
Methods – ETU (4/9) • ETU Metric • Example
Methods – Reshaped (5/9) • Forward Legendre reshape ETU ≈ 1/2 ETU ≈ 1
Methods – Reshaped (6/9) • Inverse Legendre • T213 model reshape
Methods – Reshaped (7/9) • Inverse Legendre • T213 model reconstruct
Methods – Reshaped (8/9) • Inverse Legendre • T213 model • computation for trapezium α and β
Methods – Concurrent Kernel (9/9) • Concurrent Kernel Execution • Supported by Fermi and later architectures • Programs with many small kernels can efficiently executed on GPUs • The consideration of software scalability in the future • T213 model
Experiments (1/4) • Validation of ETU metric • T341 model • Variable Block size • Observations • Basically larger ETU indicates better performance • No direct relationship shows between OCCUPANCY and performance • Same OCCUPANCY doesn't mean equal performance • Same-OCCUPANCY, larger-ETU, better performance
Experiments (2/4) • Performance Forward Legendre Inverse Legendre
Experiments (3/4) • Case Study: STSWM • A global shallow water model based on S.H.T. • Exhibits many mathematical and computational properties of more complete models • Used to investigate and compare numerical methods for simulating atmospheric models • T213 truncation • Forward Legendre: ftrnve, ftrndi and ftrnpi • Invserse legendre: shtrns
Experiments (4/4) • Case Study: STSWM
Review • Motivation • Spherical Harmonic Transforms • Methods • Direct Method • Efficiency of Threads Utilization • Reshaped Method • Concurrent Kernel Execution • Experiments
Thanks for your attention! Any Question? Email: lufengshun@nudt.edu.cn linwangqun2005@gmail.com