1 / 1

GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover

GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science, Stony Brook University http://www.cs.sunysb.edu/ ~ vislab/projects/gpgpu/GPU_Cluster/GPU_Cluster.html and Large-Scale Simulation

omer
Download Presentation

GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Cluster for Scientific Computing Zhe Fan,Feng Qiu, Arie Kaufman,Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science, Stony Brook University http://www.cs.sunysb.edu/~vislab/projects/gpgpu/GPU_Cluster/GPU_Cluster.html and Large-Scale Simulation • Stony Brook Visual Computing Cluster • GPU Cluster • 35 nodes with nVIDIA GeForce FX 5800 Ultra • Gigabit Ethernet • 70 Pentium Xeon 2.4GHz CPUs • 35 VolumePro 1000 • 9 HP Sepia-2A with ServerNet II • Scale up LBM to the GPU Cluster • Each GPU computes a sub-lattice • Particles stream out of the sub-lattice • Gather particle distributions in a texture • Read out from GPU in a single operation • Transfer through GigaE (MPI) • Write into neighboring GPU nodes • Network performance optimization: • Conduct network transfer while computing • Schedule to reduce the likelihood of interruption • Simplify the connection pattern • Times Square Area of NYC Flow Streamlines • 0.31 second / step on 30 GPUs • 4.6 times faster than software version on 30 CPUs • 1.66 km x 1.13 km • 91 blocks • 851 buildings • 480 x 400 x 80 lattice • LBM on the GPU Application: large-scale CFD simulations using Lattice Boltzmann Model (LBM) LBM Computation: • Particles stream along lattice links • Particles collide when they meet at a site Map to GPU: • Pack 3D lattice states into a series of 2D textures • Update the lattice with fragment programs • GPU Cluster / CPU Cluster Speedup • Each node computes an 80 x 80 x 80 sub-lattice • GeForce FX 5800 Ultra / Pentium Xeon 2.4GHz Dispersion Plume • Acknowledgements • NSF CCR0306438 • Department of Homeland Security, Environment Measurement Lab • HP • Terarecon

More Related