230 likes | 245 Views
Explore the evolution of computational power, need for parallel computing, intensive applications, algorithms, approaches, and performance metrics in the realm of high-performance computing. Enhance your understanding of the key considerations and takeaways of parallel programming.
E N D
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul .S. Sampath May 9th 2007
Floating Point Operations Per Second (FLOPS) • Humans doing long division: Milli-flops (1/1000th of one flop) • Cray-1 supercomputer, 1976, $8m: 80 MFLOPS • Pentium II, 400 mhz: 100 MFLOPS • TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS • Sony Playstation 3, 2006: 2 TFLOPS • IBM TRIPS, 2010 (one-chip solution, CPU only): 1 TFLOPS • IBM Blue Gene, < 2010 (with 65,536 microprocessors): 360 TFLOPS
Why do we need more? • "DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, 1980. • "640k ought to be enough for anybody"--Bill Gates, 1981. • Bottom-line: Demand for computational power will continue to increase.
Some Computationally Intensive Applications Today • Computer Aided Surgery • Medical Imaging • MD simulations • FEM simulations with > 10^10 unknowns • Galaxy formation and evolution • 17 million particle Cold Dark Matter Cosmology simulation
Any application, which can be scaled up should be treated as a computationally intensive application.
The Need for Parallel Computing • Memory (RAM) • There is a theoretical limit on the RAM that is available on your computer. • 32 bit systems: 4GB (2^32) • 64 bit systems: 16 exabytes (> 16,000 TB) • Speed • Upgrading microprocessors can’t help you anymore • Flops is not the bottleneck, memory is • What we need is more registers • Think pre-computing, higher bandwidth memory bus, L2/L3 cache, compiler optimizations, assembly language Asylum • Or… • Think parallel…
Hacks • If Speed is not an issue… • Is out-of-core implementation an option? • Parallel programs can be converted into out-of-core implementations easily.
The Key Questions • Why? • Memory • Speed • Both • What kind of platform? • Shared Memory • Distributed Computing • Typical size of the application • Small (< 32 processors) • Medium ( 32 - 256 processors) • Large (> 256 processors) • How much time and effort do you want to invest? • How many times will the component be used in a single execution of the program?
Factors to Consider in any Parallel Algorithm Design • Give equal work to all processors at all times • Load Balancing • Give equal amount of data to all processors • Efficient Memory Management • Processors should work independently as much as possible • Minimize communication, especially iterative communication • If communication is necessary, try to do some work in the background as well • Overlapping communication and computation • Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible • Optimal Work Algorithm
Difference Between Sequential and Parallel Algorithms • Not all data is accessible at all times • All computations must be as localized as possible • Can’t have random access • New dimension to the existing algorithm – division of work • Which processor does what portion of the work? • If communication can not be avoided • How will it be initiated? • What type of communication? • What are the pre-processing and post-processing operations? • Order of operations could be very critical for performance
Parallel Algorithm Approaches • Data-Parallel Approach • Partition the data among the processors • Each processor will execute the same set of commands • Control-Parallel Approach • Partition the tasks to be performed among the processors • Each processor will execute different commands • Hybrid Approach • Switch between the two approaches at different stages of the algorithm • Most parallel algorithms fall in this category
Performance Metrics • Speedup • Overhead • Scalability • Fixed Size • Iso-granular • Efficiency • Speedup per processor • Iso-Efficiency • Problem size as a function of p in order to keep efficiency constant
The Take Home Message • A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm. • What model to use? – Problem dependent. • e.g. a+b+c+… = (a+b) + (c+d) + … • Not much choice really. • It is a big investment, but can really be worth it.
How does a parallel program work? • You request a certain number of processors • You setup a communicator • Give a unique id to each processor – rank • Every processor executes the same program • Inside the program • Query for the rank and use it decide what to do • Exchange messages between different processors using their ranks • In theory, you only need 3 functions: Isend, Irecv, wait • In practice, you can optimize communication depending on the underlying network topolgoy – Message Passing Standards…
Message Passing Standards • The standards define a set of primitive communication operations. • The vendors implementing these on any machine are responsible to optimize these operations for that machine. • Popular Standards • Message Passing Interface (MPI) • Open Message Passing (OpenMP)
Languages that support MPI • Fortran 77 • C/C++ • Python • Matlab
MPI Implementations • MPICH • ftp://info.mcs.anl.gov/pub/mpi • LAM • http://www.mpi.nd.edu/lam/download • CHIMP • ftp://ftp.epcc.ed.ac.uk/pub/chimp/release • WinMPI (Windows) • ftp://csftp.unomaha.edu/pub/rewini/WinMPI • W32MPI (Windows) • http://dsg.dei.uc.pt/wmpi/intro.html
Open Source Parallel Software • PETSc ( Linear and NonLinear Solvers ) • http://www-unix.mcs.anl.gov/petsc/petsc-as/ • ScaLAPACK ( Linear Algebra ) • http://www.netlib.org/scalapack/scalapack_home.html • SPRNG ( Random Number Generator ) • http://sprng.cs.fsu.edu/ • Paraview ( Visualization ) • http://www.paraview.org/HTML/Index.html • NAMD ( Molecular Dynamics ) • http://www.ks.uiuc.edu/Research/namd/ • CHARMM++ ( Parallel Objects ) • http://charm.cs.uiuc.edu/research/charm/
References • Parallel Programming with MPI, Peter S. Pacheco • Introduction to Parallel Computing, A. Grama, A. gupta, G. Karypis, V. Kumar • MPI-The Complete Reference, William Gropp et.al. • http://www-unix.mcs.anl.gov/mpi/ • http://www.erc.msstate.edu/mpi • http://www.epm.ornl.gov/~walker/mpi • http://www.erc.msstate.edu/mpi/mpi-faq.html (FAQ) • Comp.parallel.mpi (Newsgroup) • http://www.mpi-forum.org (MPI Forum)