60 likes | 179 Views
Introductory Courses in High Performance Computing at Illinois. David Padua. Our oldest course. 420 Parallel Programming for Scientists and Engineers. Course intended for non-CS majors (but many CS students take it). Taught once a year for the last 20 years.
E N D
Introductory Courses in High Performance Computing at Illinois David Padua
Our oldest course 420 Parallel Programming for Scientists and Engineers. Course intended for non-CS majors (but many CS students take it). Taught once a year for the last 20 years. CS 420 Parallel Progrmg: Sci & Engrg Credit: 3 or 4 hours. Fundamental issues in design and development of parallel programs for various types of parallel computers. Various programming models according to both machine type and application area. Cost models, debugging, and performance evaluation of parallel programs with actual application examples. Same as CSE 402 and ECE 492. 3 undergraduate hours. 3 or 4 graduate hours. Prerequisite: CS 400 or CS 225
420 Parallel Programming for Scientists and Engineers • Machines • Programming models • Shared-memory • Distributed memory • Data parallel • OpenMP/MPI/Fortran 90 • Clusters/Shared-memory machines/Vector supercomputers (in the past) • Data parallel numerical algorithms (in Fortran 90/MATLAB) • Sorting/N-Body
Other courses • 4xx Parallel programming. For majors • 4xx Performance Programming. For all issues related to performance • 5xx Theory of parallel computing. For advanced students • 554 Parallel Numerical Algorithms.
4xx Parallel programming. For majors • Overview of architectures. Architectural characterization of most important parallel systems today. Issues in effective programming of parallel architectures: exploitation of parallelism, locality (cache, registers), load balancing, communication, overhead, consistency, coherency, latency avoidance. Transactional memories. • Programming paradigms. Shared-memory, message passing, data parallel or regular, and functional programming paradigms. .. Message-passing programming. PGAS programming. Survey of programming languages. OpenMP, MPI, TBB, Charm++, UPC, Co-array Fortran, High-Performance Fortran, NESL. • Concepts. Basic concepts in parallel programming. Speedup, efficiency, redundancy, isoefficiency, Amdahl's law. • Programming principles. Reactive parallel programming. Memory consistency. Synchronization strategies, critical regions, atomic updates, races, deadlock avoidance, prevention, livelock, starvation, scheduling fairness. Lock-free algorithms. Asynchronous algorithms. Speculation. Load balancing. Locality enhancement. Lock free algorithms. Asynchronous algorithms. • Algorithms. Basic algorithms: Element-by-element array operations, reductions, parallel prefix, linear recurrences, boolean recurrences. Systolic arrays, Matrix multiplication, LU decomposition, Jacobi relaxation, fixed point iterations. Sorting and searching. Graph algorithms, Datamining algorithms. N-Body/Particle simulations.
4xx Performance Programming. • Sequential Performance bottlenecks: CPU (pipelining, multiple issue processors (in-order and out-of order), support for speculation, branch prediction, execution units, vectorization, registers, register renaming); caches (temporal and spatial locality, compulsory misses, conflict misses, capacity misses, coherence misses); memory (latency, row/column, read/write), I/O… • Parallel performancebottlenecks: Amdahl, load imbalance, communication, false sharing, granularity of communication (distributed memory) • Optimization strategies: Algorithm and program optimizations. Static and dynamic optimizations. Data dependent optimizations. Machine dependent and machine independent optimizations. • Sequential program optimizations: Redundancy elimination. Peephole optimizations. Loop optimizations. Branch optimizations. • Locality optimizations. Tiling. Cache oblivious and cache conscious algorithms. Padding. Hardware and software prefetch. • Parallel programming optimizations: Brief introduction to parallel programming of shared-memory machines. Dependence graphs and program optimizations. Privatization, expansion, induction variables, wrap-around variables, loop fusion and loop fission. Frequently occurring kernels (reductions, scan, linear recurrences) and their parallel versions. Program vectorization. Multimedia extensions and their programming. Speculative parallel programming. Load balancing. Bottlenecks. Overdecomposition. • Communication optimizations. Aggregation for communication. Redundant computations to save avoid communication. False sharing. • Optimization for power. • Tools for program tuning. Performance monitors. Profiling. Sampling. Compiler switches, directives and compiler feedback. • Autotuning. Empirical search. Machine learning strategies for program optimization. Libbrary generators. ATLAS, FFTW, SPIRAL. • Algorithm choice and tuning. Hybrid algorithms. Self optimizing algorithms. Sorting. Datamining. Numerical error and algorithm choice.