1 / 23

Combining MPI and GPGPU for Monte Carlo Applications

Combining MPI and GPGPU for Monte Carlo Applications. Andrew Dunkman Nathan Eloe. Overview. Monte Carlo Simulations: The Ising Model Two approaches to parallel solution: MPI GPGPU (CUDA/ OpenCL ) Combining solutions (MPI/GPGPU ). Monte Carlo Simulation.

shalom
Download Presentation

Combining MPI and GPGPU for Monte Carlo Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining MPI and GPGPU for Monte Carlo Applications Andrew Dunkman Nathan Eloe

  2. Overview • Monte Carlo Simulations: The Ising Model • Two approaches to parallel solution: • MPI • GPGPU (CUDA/OpenCL) • Combining solutions (MPI/GPGPU)

  3. Monte Carlo Simulation • Method of evaluating multidimensional integrals. • Take weighted average of points in bounded area as approximation of the integral’s value. • Applications in Statistics and Physics. • Inherently parallelizable.

  4. The Ising Model A specific Monte Carlo Simulation • Simplistic model of a magnetic material. • Square lattice of sites, each with magnetic spin [-1,+1] • During each Monte Carlo sweep, each site is given the chance to change its polarity. • The probability of a site changing its polarity is

  5. Parallelizing Monte Carlo Using MPI • Every site can be calculated independently of the other sites. • Only important information is the state of the system at the beginning of the sweep. • Only the sites adjacent to the site of interest are important to the calculation • Perfect for a mesh configuration of processors

  6. Parallelizing Monte Carlo Optimizing MPI’s Communication • Assuming a n by n square lattice, communicating the entire lattice after every sweep would give a communication cost of O(n2) per sweep. • This is unnecessary. We only need to communicate the sites along the boundaries of the data separation, as well as some aggregate data per sweep. • Can reduce this communication cost to O(n)

  7. Communication Example: P=4 n P0 P1 4 communications, N elements each Even if you gather the change in energy and magnetism per sweep at P0, you are still looking at O(n) elements to communicate. n P2 P3

  8. Parallelizing Monte Carlo Using MPI Pros Cons Communication overhead between all processes. Each process still has to sequentially calculate every site in its block • Reducing the problem size per process (for N>>P). • Communication can be reduced to O(n) between sweeps.

  9. Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) What is GPGPU?

  10. What is GPGPU? • General Purpose Programming on a GPU (GPGPU) is an relatively new trend in parallel computing. • Graphics adapters are heavily optimized for floating point math (core of graphics applications). • GPU is controlled by host process (CPU).

  11. What are OpenCL and CUDA? • OpenCL is an open standard supported by all modern graphics card manufacturers that allows access to the graphics card’s computing abilities. • CUDA is a set of extensions on top of OpenCL specific to nVidia graphics cards.

  12. OpenCLand CUDA

  13. OpenCLand APP (Stream)

  14. Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) • One O(n2) communication at beginning • Can be avoided if the card can generate the initial state. • After each Monte Carlo sweep, no communication of the global state to the host (CPU) is needed.

  15. Parallelizing Monte Carlo Advantages to GPGPU (CUDA/OpenCL) • FLOPs are crazy fast on GPUs. • Modern GPUs have a large number of processing units onboard. • The new GeForce GTX 580 has 512 CUDA Cores • On card memory communication very fast. • Shared memory to some degree. • Closer to P=n

  16. CUDA vs CPU (Single Thread)

  17. Resulting Lattice

  18. Parallelizing Monte Carlo Using GPGPU (CUDA/OpenCL) Pros Cons Limited memory The GTX 580 only has <1.5 Gb RAM (albeit very fast RAM) • Fast, closer to P=n • Possible one time communication, very little communication between sweeps. • Communication speed only limited by bandwidth of PCIe bus

  19. So, can we use BOTH!?

  20. Parallelizing Monte Carlo Using MPI and GPGPU (CUDA/OpenCL) • Use same communication scheme as MPI alone. • Divide problem over hosts, and only communicate bordering sites between sweeps • Only update the changed information between the host and the GPU

  21. Communication Example: P=4 n P0 P1 n P2 P3

  22. Code Demo

  23. Questions? Bibliography Image Sources http://www.nvidia.com/object/product-geforce-gtx-580-us.html http://pressroom.nvidia.com/easyir/imga.do?easyirid=A0D622CE9F579F09 amd.com • Kirk, David, Wen-Mei W., and Wen-meiHwu.Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010. Print. • Pang, Tao. Introduction to Computational Physics. [S.l.]: Cambridge Univ, 2005. Print.

More Related