Seminar on parallel computing

Seminar on parallel computing • Goal: provide environment for exploration of parallel computing • Driven by participants • Weekly hour for discussion, show & tell • Focus primarily on distributed memory computing on linux PC clusters • Target audience: • Experience with linux computing & Fortran/C • Requires parallel computing for own studies • 1 credit possible for completion of ‘proportional’ project

Main idea • Distribute a job over multiple processing units • Do bigger jobs than is possible on single machines • Solve bigger problems faster • Resources: e.g., www-jics.cs.utk.edu

Sequential limits • Moore’s law • Clock speed physically limited • Speed of light • Miniaturization; dissipation; quantum effects • Memory addressing • 32 bit words in PCs: 4 Gbyte RAM max.

Machine architecture: serial • Single processor • Hierarchical memory: • Small number of registers on CPU • Cache (L1/L2) • RAM • Disk (swap space) • Operations require multiple steps • Fetch two floating point numbers from main memory • Add and store • Put back into main memory

Vector processing • Speed up single instructions on vectors • E.g., while adding two floating point numbers fetch two new ones from main memory • Pushing vectors through the pipeline • Useful in particular for long vectors • Requires good memory control: • Bigger cache is better • Common on most modern CPUs • Implemented in both hardware and software

SIMD • Same instruction works simultaneously on different data sets • Extension of vector computing • Example: DO IN PARALLEL for i=1,n x(i) = a(i)*b(i) end DONE PARALLEL

MIMD • Multiple instruction, multiple data • Most flexible, encompasses SIMD/serial. • Often best for ‘coarse grained’ parallelism • Message passing • Example: domain decomposition • Divide computational grid in equal chunks • Work on each domain with one CPU • Communicate boundary values when necessary

Historical machines • 1976 Cray-1 at Los Alamos (vector) • 1980s Control Data Cyber 205 (vector) • 1980s Cray XMP • 4 coupled Cray-1s • 1985 Thinking Machines Connection Machine • SIMD, up to 64k processors • 1984+ Nec/Fujitsu/Hitachi • Automatic vectorization

Sun and SGI (90s) • Scaling between desktops and compute servers • Use of both vectorization and large scale parallelization • RISC processors • Sparc for Sun • MIPS for SGI: PowerChallenge/Origin

Happy developments • High performance Fortran / Fortran 90 • Definitions for message passing languages • PVM • MPI • Linux • Performance increase of commodity CPUs • Combination leads to affordable cluster computing

Who’s the biggest • www.top500.org • Linpack matrix-vector benchmarks • June 2003: • Earth Simulator, Yokohama, NEC, 36 Tflops • Asci Q, Los Alamos, HP, 14 Tflops • Linux cluster, Livermore, 8 Tflops

Parallel approaches • Embarrassingly parallel • “Monte Carlo” searches • SETI @ home • Analyze lots of small time series • Parallalize DO-loops in dominantly serial code • Domain decomposition • Fully parallel • Requires complete rewrite/rethinking

Example: seismic wave propagation • 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002) • Massively parallel computation on linux PC clusters • Approx. 34 Gbyte RAM needed for 10 km average resolution • www.geo.lsa.umich.edu/~keken/waves

Resolution • Spectral elements: 10 km average resolution • 4th order interpolation functions • Reasonable graphics resolution: 10 km or better • 12 km: 10243 = 1 GB • 6 km: 20483 = 8 GB

Simulated EQ (d=15 km) after 17minutes 512x512 256 colors Positive only Truncated max Log10 scale Particle velocity P PPP PP PKPab SK PKP PKIKP

512x512 256 colors Positive only Truncated max Log10 scale Particle velocity Some S component PcSS SS R S PcS PKS

Resources at UM • Various linux clusters in Geology • Agassiz (Ehlers) 8 Pentium 4 @ 2 Gbyte each • Panoramix (van Keken) 10 P3 @ 512 Gbyte • Trans (van Keken, Ehlers) 24 P4 @ 2 Gbyte • SGIs • Origin 2000 (Stixrude, Lithgow, van Keken) • Center for Advanced Computing @ UM • Athlon clusters (384 nodes @ 1 Gbyte each) • Opteron cluster (to be installed) • NPACI

Software resources • GNU and Intel compilers • Fortran/Fortran 90/C/C++ • MPICH www-fp.mcs.anl.gov • Primary implementation of MPI • “Using MPI” 2nd edition, Gropp et al., 1999 • Sun Grid Engine • Petsc www-fp.mcs.anl.gov • Toolbox for parallel scientific computing

Seminar on parallel computing

Seminar on parallel computing

Presentation Transcript

Parallel Computing

Workshop on Parallel Computing

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

Parallel Computing

Parallel Computing

Parallel computing

Parallel Computing

Parallel Computing

Parallel Computing

Scalable Parallel Computing on Clouds

Parallel Computing

Parallel Computing on Graphics Processors

Parallel Computing

Parallel Computing

Parallel Computing

Parallel computing

Seminar on parallel computing

Parallel Computing on Manycore GPUs

Parallel Computing Seminar

More on Parallel Computing

Parallel Computing