290 likes | 305 Views
Learn the basics of parallel computing, including machine architecture, parallel algorithms, programming environments, and evaluation of performance. Explore the potential of parallel computing in solving complex problems.
E N D
CS160 – Spring 2000http://www-cse.ucsd.edu/classes/sp00/cse160 Prof. Fran Berman - CSE Dr. Philip Papadopoulos - SDSC
Two Instructors/One Class • We are team-teaching the class • Lectures will be split about 50-50 along topic lines. (We’ll keep you guessing as to who will show up next lecture ) • TA is Derrick Kondo. He is responsible for grading homework and programs • Exams will be graded by Papadopoulos/Berman
Prerequisites • Know how to program in C • CSE 100 (Data Structures) • CSE 141 (Computer Architecture) would be helpful but not required.
Grading • 25% Homework • 25% Programming assignments • 25% Midterm • 25% Final Homework and Programming Assignments Due at beginning of section
Policies • Exams are closed book, closed notes • No Late Homework • No Late Programs • No Makeup exams • All assignments are to be your own original work. • Cheating/copying from anyone/anyplace will be dealt with severely
Office Hours (Papadopoulos) • My office is SSB 251 (Next to SDSC) • Hours will be TuTh 2:30 – 3:30 or by appointment. • My email is phil@sdsc.edu • My campus phone is 822-3628
Course Materials • Book: Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers, by B. Wilkinson and Michael Allen. • Web site: Will try to make lecture notes available before class • Handouts: As needed.
Computers/Programming • Please see the TA about getting an account for the undergrad APE lab. • We will use PVM for programming on workstation clusters. • A word of advice: With the web, you can probably find almost completed source code somewhere. Don’t do this. Write the code yourself. You’ll learn more. See policy on copying.
Introduction to Parallel Computing • Topics to be covered. See syllabus (online) for full details • Machine architecture and history • Parallel machine organization, • Parallel algorithm paradigm • Parallel programming environments and tools • Heterogeneous computing. • Evaluating Performance • Grid Computing • Parallel programming and project assignments
What IS Parallel Computing? • Applying multiple processors to solve a single problem • Why? • Increased performance for rapid turnaround time (wall clock time) • More available memory on multiple machines • Natural progression of standard Von Neumann Architecture
World’s 10th Fastest Machine (as of November 1999) @ SDSC 1152 Processors
Are There Really Problems that Need O(1000) processors? • Grand Challenge Codes • First Principles Materials Science • Climate modeling (ocean, atmosphere) • Soil Contamination Remediation • Protein Folding (gene sequencing) • Hydrocodes • Simulated nuclear device detonation • Code breaking (No Such Agency)
There must be problems with the approach • Scaling with efficiency (speedup) • Unparallelizable portions of code (Amdahl’s law) • Reliability • Programmability • Algorithms • Monitoring • Debugging • I/O • … • These and more keep the field interesting
A Brief History of Parallel Super Computers • There have been many (dead) supercomputers • The Dead Supercomputer Society • http://ei.cs.vt.edu/~history/Parallel.html • Parallel Computing Works • Will touch on about a dozen of the important ones
Basic Measurement Yardsticks • Peak Performance (AKA, guaranteed never to exceed) = nprocs X FLOPS/proc • NAS Parallel Benchmarks • Linpack Benchmark for the TOP 500 • Later in the course, We will explore about how to Fool the Masses and valid ways to measure performance
Illiac IV (1966 – 1970) • $100 Million of 1990 Dollars • Single instruction multiple data (SIMD) • 32 - 64 Processing elements • 15 Megaflops • Ahead of its time
ICL DAP (1979) • Distributed array Processor (also SIMD) • 1K – 4K bit Serial processors • Connected in a mesh • Required an ICL mainframe to front-end the main processor array • Never caught on in the US
Goodyear MPP (late 1970s) • 16K bit-serial processors (SIMD) • Goddard Space and Flight Center – NASA • Only a few sold. Similar to the ICL DAP • About 100 Mflops (100 MHz Pentium)
Cray-1 (1976) • Seymour Cray, Designer • NOT a parallel machine • Single processor machine with vector registers • Largely regarded as starting the modern supercomputer revolution • 80 MHz Processor (80 MFlops)
Denelcor HEP (Heterogeneous Element Processor, early 80’s) • Burton Smith, Designer • Multiple Instruction, Multiple Data (MIMD) • Fine (instruction-level) and Large-grain parallelism (16 processors) • Instructions from different programs ran in per-processor hardware queues (128 threads/proc) • Precursor to the Tera MTA (Multithreaded architecture • Full-empty bit for every memory location. Allowed fast synchronization • Important research machine
Caltech Cosmic Cube - 1983 • Chuck Seitz (Founded Myricom) and Geoffrey Fox (Lattice gauge theory) • First Hypercube interconnection network • 8086/8087 based machine with Eugene Brooks’ Crystalline Operating System (CrOS) • 64 Processors by 1983 • About 15x cheaper than a VAX 11/780 • Begat nCUBE, Floating Point Systems, Ametek, Intel Supercomputers (all dead companies) • 1987 – Vector coprocessor system achieved 500MFlops
Cray – XMP (1983) and Cray-2 (1985) • Up to 4-Way shared memory machines • This was the first supercomputer at SDSC • Best Performance (600 Mflop Peak) • Best Price/Performance of the time
Late 1980’s • Proliferation of (now dead) parallel computers • CM-2 (SIMD) (Danny Hillis) • 64K bit-serial, 2048 Vector Coprocessors • Achieved 5.2 Gflops on Linpack (LU Factorization) • Intel iPSC/860 (MIMD - MPP) • 128 Processors • 1.92 Gigaflops (Linpack) • Cray Y/MP (Vector Super) • 8 processors (333 Mflops/proc peak) • Achieved 2.1 Gigaflops (Linpack) • BBN Butterfly (Shared memory) • Many others (long since forgotten)
Early 90’s • Intel Touchstone Delta and Paragon (MPP) • Follow-On iPSC/860 • 13.2 Gflops on 512 Processors • 1024 Nodes delivered to ORNL in 1993 (150 GFLOPS Peak) • Cray C-90 (Vector Super) • 16 Processor update of the Y/MP • Extremely popular, efficient and expensive • Thinking Machines CM-5 (MPP) • Upto 16K Processors • 1024 Node System at Los Alamos National Lab
More 90’s • Distributed Shared Memory • KSR-1 (Kendall Square Research) • COMA (Cache Only Memory Architecture) • University Projects • Stanford DASH Processor (Hennessy) • MIT Alewife (Agarwal) • Cray T3D/T3E. Fast Processor Mesh with upto 512 Alpha CPUs
What Can you Buy Today? (not an exhaustive list) • IBM SP • Large MPP or Cluster • SGI Origin 2000 • Large Distributed Shared Memory Machine • Sun HPC 10000 – 64 Processor True Shared Memory • Compaq Alpha Cluster • Tera MTA • Multithreaded architecture (one in existence) • Cray SV-1 Vector Processor • Fujitsu and Hitachi Vector Supers
Clusters • Poor man’s Supercomputer? • A pile-of-PC’s • Ethernet or High-speed (eg. Myrinet) network • Likely to be the dominant high-end architecture. • Essentially a build-it-yourself MPP.
Next Time … • Flynn’s Taxonomy • Bit-Serial, Vector, Pipelined Processors • Interconnection Networks • Routing Techniques • Embedding • Cluster interconnects • Network Bisection