1 / 18

Seminar on parallel computing

Seminar on parallel computing. Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show & tell Focus primarily on distributed memory computing on linux PC clusters Target audience: Experience with linux computing & Fortran/C

Download Presentation

Seminar on parallel computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar on parallel computing • Goal: provide environment for exploration of parallel computing • Driven by participants • Weekly hour for discussion, show & tell • Focus primarily on distributed memory computing on linux PC clusters • Target audience: • Experience with linux computing & Fortran/C • Requires parallel computing for own studies • 1 credit possible for completion of ‘proportional’ project

  2. Main idea • Distribute a job over multiple processing units • Do bigger jobs than is possible on single machines • Solve bigger problems faster • Resources: e.g., www-jics.cs.utk.edu

  3. Sequential limits • Moore’s law • Clock speed physically limited • Speed of light • Miniaturization; dissipation; quantum effects • Memory addressing • 32 bit words in PCs: 4 Gbyte RAM max.

  4. Machine architecture: serial • Single processor • Hierarchical memory: • Small number of registers on CPU • Cache (L1/L2) • RAM • Disk (swap space) • Operations require multiple steps • Fetch two floating point numbers from main memory • Add and store • Put back into main memory

  5. Vector processing • Speed up single instructions on vectors • E.g., while adding two floating point numbers fetch two new ones from main memory • Pushing vectors through the pipeline • Useful in particular for long vectors • Requires good memory control: • Bigger cache is better • Common on most modern CPUs • Implemented in both hardware and software

  6. SIMD • Same instruction works simultaneously on different data sets • Extension of vector computing • Example: DO IN PARALLEL for i=1,n x(i) = a(i)*b(i) end DONE PARALLEL

  7. MIMD • Multiple instruction, multiple data • Most flexible, encompasses SIMD/serial. • Often best for ‘coarse grained’ parallelism • Message passing • Example: domain decomposition • Divide computational grid in equal chunks • Work on each domain with one CPU • Communicate boundary values when necessary

  8. Historical machines • 1976 Cray-1 at Los Alamos (vector) • 1980s Control Data Cyber 205 (vector) • 1980s Cray XMP • 4 coupled Cray-1s • 1985 Thinking Machines Connection Machine • SIMD, up to 64k processors • 1984+ Nec/Fujitsu/Hitachi • Automatic vectorization

  9. Sun and SGI (90s) • Scaling between desktops and compute servers • Use of both vectorization and large scale parallelization • RISC processors • Sparc for Sun • MIPS for SGI: PowerChallenge/Origin

  10. Happy developments • High performance Fortran / Fortran 90 • Definitions for message passing languages • PVM • MPI • Linux • Performance increase of commodity CPUs • Combination leads to affordable cluster computing

  11. Who’s the biggest • www.top500.org • Linpack matrix-vector benchmarks • June 2003: • Earth Simulator, Yokohama, NEC, 36 Tflops • Asci Q, Los Alamos, HP, 14 Tflops • Linux cluster, Livermore, 8 Tflops

  12. Parallel approaches • Embarrassingly parallel • “Monte Carlo” searches • SETI @ home • Analyze lots of small time series • Parallalize DO-loops in dominantly serial code • Domain decomposition • Fully parallel • Requires complete rewrite/rethinking

  13. Example: seismic wave propagation • 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002) • Massively parallel computation on linux PC clusters • Approx. 34 Gbyte RAM needed for 10 km average resolution • www.geo.lsa.umich.edu/~keken/waves

  14. Resolution • Spectral elements: 10 km average resolution • 4th order interpolation functions • Reasonable graphics resolution: 10 km or better • 12 km: 10243 = 1 GB • 6 km: 20483 = 8 GB

  15. Simulated EQ (d=15 km) after 17minutes 512x512 256 colors Positive only Truncated max Log10 scale Particle velocity P PPP PP PKPab SK PKP PKIKP

  16. 512x512 256 colors Positive only Truncated max Log10 scale Particle velocity Some S component PcSS SS R S PcS PKS

  17. Resources at UM • Various linux clusters in Geology • Agassiz (Ehlers) 8 Pentium 4 @ 2 Gbyte each • Panoramix (van Keken) 10 P3 @ 512 Gbyte • Trans (van Keken, Ehlers) 24 P4 @ 2 Gbyte • SGIs • Origin 2000 (Stixrude, Lithgow, van Keken) • Center for Advanced Computing @ UM • Athlon clusters (384 nodes @ 1 Gbyte each) • Opteron cluster (to be installed) • NPACI

  18. Software resources • GNU and Intel compilers • Fortran/Fortran 90/C/C++ • MPICH www-fp.mcs.anl.gov • Primary implementation of MPI • “Using MPI” 2nd edition, Gropp et al., 1999 • Sun Grid Engine • Petsc www-fp.mcs.anl.gov • Toolbox for parallel scientific computing

More Related