230 likes | 404 Views
Combining MPI and GPGPU for Monte Carlo Applications. Andrew Dunkman Nathan Eloe. Overview. Monte Carlo Simulations: The Ising Model Two approaches to parallel solution: MPI GPGPU (CUDA/ OpenCL ) Combining solutions (MPI/GPGPU ). Monte Carlo Simulation.
E N D
Combining MPI and GPGPU for Monte Carlo Applications Andrew Dunkman Nathan Eloe
Overview • Monte Carlo Simulations: The Ising Model • Two approaches to parallel solution: • MPI • GPGPU (CUDA/OpenCL) • Combining solutions (MPI/GPGPU)
Monte Carlo Simulation • Method of evaluating multidimensional integrals. • Take weighted average of points in bounded area as approximation of the integral’s value. • Applications in Statistics and Physics. • Inherently parallelizable.
The Ising Model A specific Monte Carlo Simulation • Simplistic model of a magnetic material. • Square lattice of sites, each with magnetic spin [-1,+1] • During each Monte Carlo sweep, each site is given the chance to change its polarity. • The probability of a site changing its polarity is
Parallelizing Monte Carlo Using MPI • Every site can be calculated independently of the other sites. • Only important information is the state of the system at the beginning of the sweep. • Only the sites adjacent to the site of interest are important to the calculation • Perfect for a mesh configuration of processors
Parallelizing Monte Carlo Optimizing MPI’s Communication • Assuming a n by n square lattice, communicating the entire lattice after every sweep would give a communication cost of O(n2) per sweep. • This is unnecessary. We only need to communicate the sites along the boundaries of the data separation, as well as some aggregate data per sweep. • Can reduce this communication cost to O(n)
Communication Example: P=4 n P0 P1 4 communications, N elements each Even if you gather the change in energy and magnetism per sweep at P0, you are still looking at O(n) elements to communicate. n P2 P3
Parallelizing Monte Carlo Using MPI Pros Cons Communication overhead between all processes. Each process still has to sequentially calculate every site in its block • Reducing the problem size per process (for N>>P). • Communication can be reduced to O(n) between sweeps.
Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) What is GPGPU?
What is GPGPU? • General Purpose Programming on a GPU (GPGPU) is an relatively new trend in parallel computing. • Graphics adapters are heavily optimized for floating point math (core of graphics applications). • GPU is controlled by host process (CPU).
What are OpenCL and CUDA? • OpenCL is an open standard supported by all modern graphics card manufacturers that allows access to the graphics card’s computing abilities. • CUDA is a set of extensions on top of OpenCL specific to nVidia graphics cards.
Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) • One O(n2) communication at beginning • Can be avoided if the card can generate the initial state. • After each Monte Carlo sweep, no communication of the global state to the host (CPU) is needed.
Parallelizing Monte Carlo Advantages to GPGPU (CUDA/OpenCL) • FLOPs are crazy fast on GPUs. • Modern GPUs have a large number of processing units onboard. • The new GeForce GTX 580 has 512 CUDA Cores • On card memory communication very fast. • Shared memory to some degree. • Closer to P=n
Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) Pros Cons Limited memory The GTX 580 only has <1.5 Gb RAM (albeit very fast RAM) • Fast, closer to P=n • Possible one time communication, very little communication between sweeps. • Communication speed only limited by bandwidth of PCIe bus
Parallelizing Monte Carlo Using MPI and GPGPU (CUDA/OpenCL) • Use same communication scheme as MPI alone. • Divide problem over hosts, and only communicate bordering sites between sweeps • Only update the changed information between the host and the GPU
Communication Example: P=4 n P0 P1 n P2 P3
Questions? Bibliography Image Sources http://www.nvidia.com/object/product-geforce-gtx-580-us.html http://pressroom.nvidia.com/easyir/imga.do?easyirid=A0D622CE9F579F09 amd.com • Kirk, David, Wen-Mei W., and Wen-meiHwu.Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010. Print. • Pang, Tao. Introduction to Computational Physics. [S.l.]: Cambridge Univ, 2005. Print.