290 likes | 435 Views
COMP4300/COMP6430 Parallel Systems 2013. Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University. Concept and Rationale. The idea Split your program into bits that can be executed simultaneously Motivation
E N D
COMP4300/COMP6430Parallel Systems2013 Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University
Concept and Rationale • The idea • Split your program into bits that can be executed simultaneously • Motivation • Speed, Speed, Speed… at a cost effective price • If we didn’t want it to go faster we would not be bothered with the hassles of parallel programming! • Reduce the time to solution to acceptable levels • No point waiting 1 week for tomorrow’s weather forecast • Simulations that take months to run are not useful in a design environment
Sample Application Areas • Fluid flow problems • Weather forecasting/climate modeling • Aerodynamic modeling of cars, planes, rockets etc • Structural Mechanics • Building bridge, car, etc strength analysis • Car crash simulation • Speech and character recognition, image processing • Visualization, virtual reality • Semiconductor design, simulation of new chips • Structural biology, molecular level design of drugs • Human genome mapping • Financial market analysis and simulation • Datamining, machine learning • Games programming!
World Climate Modeling • Atmosphere divided into 3D regions or cells • Complex mathematical equations describe conditions in each cell, eg pressure, temperature, velocity • Conditions change according to neighbour cells • Updates repeated frequently as time passes • Cells are affected by more distant cells the longer range the forecast • Assume • Cells are 1x1x1 mile to a height of 10 miles, 5x108cells • 200 flops to update each cell per timestep • 10 minute timesteps for total of 10 days • 100 days on 100 mflop machine • 10 minutes on a tflop machine
ParallelSystems@ANU: NCI • NCI: National Computational Infrastructure • http://nci.org.au and http://nf.nci.org.au • History • Establishment of APAC in 1998 with $19.5M grant from federal government, renewed in 2004 with a grant of about $29M • Changed to NCI in 2007 with funding through NCRIS and Super Science Programs • 2010 machine is Sun X6275 Constellation Cluster, 1492 nodes (2*2.93GHz Nehalem) or 11936 cores. QDR InfiniBand interconnect • Installing new Fujitsu Primergy system with Sandy Bridge nodes and 57,000 cores, 160TB RAM, 10PB of disk
ParallelSystems@DCS • Bunyip: tsg.anu.edu.au/Projects/Bunyip • 192 processor PC Cluster • winner of 2000 Gordon Bell prize for best price performance • High Performance Computing Group • Jabberwocky cluster • Sunnyvale cluster • Single Chip Cloud Computer
The Rise of Parallel Computing Parallelism became an issue for programmers from late 80s People began compiling lists of big parallel systems
Big Parallel Systems (Nov 2012)! www.top500.org All have multiple processors (many have GPUs)
We also had Increased Node Performance Moore’s Law ‘Transistor density will double approximately every two years.’ which led to higher Hertz and faster flops Dennard Scaling ‘As MOSFET features shrink, switching time and power consumption will fall proportionately’
Until the chips became too big… 250nm, 400mm2, 100% 180nm, 450mm2, 100% 130nm, 566mm2, 82% 100nm, 622mm2, 40% 70nm, 713mm2, 19% 50nm, 817mm2, 6.5% 35nm, 937mm2, 1.9% Agarwal, Hrishikesh, Keckler Burger, Clock Rate Versus IPC, ISCA 2000
…so multiple cores appeared on chip 2004 Sun releases Sparc IV with dual cores and heralding the start of multicore …until we hit a bigger problem…
…the end of Dennard scaling… Moore’s Law Dennard scaling ✗ ✓ ‘Transistor density will double approximately every two years.’ ‘As MOSFET features shrink, switching time and power consumption will fall proportionately.’ …ushering in.. Dennard, Gaensslen, Yu, Rideout, Bassous and Leblanc, IEEE SSC, 1974
…a new philosophy in processor design is emerging …and a fundamentally new set of building blocks for our petascale systems
Petascale and Beyond: Challenges and Opportunities In RSCS we are working in all these areas
Other Important Parallelism • Multiple instruction units: • Typical processors issue ~4 instructions per cycle • Instruction Pipelining: • Complicated operations are broken into simple operations that can be overlapped • Graphics Engines: • Use multiple rendering pipes and processing elments to render millions of polygons a second • Interleaved Memory: • Multiple paths to memory that can be used at same time • Input/Output: • Disks are striped with different blocks of data written to different disks at the same time
Parallelisation • Split program up and run parts simultaneously on different processors • On N computers the time to solution should (ideally!) be 1/N • Parallel Programming: the art of writing the parallel code! • Parallel Computer: the hardware on which we run our parallel code! COMP4300 will discuss both • Beyond raw compute power other motivations include • Enabling more accurate simulations in the same time (finer grids) • Providing access to huge aggregate memories • Providing more and/or better input/output capacity
Health Warning! • Course is run every other year • Drop out this year and it won’t be repeated until 2015 • It’s a 4000/6000 level course, it’s supposed to: • Be more challenging that a 3000 level course! • Be less well structured • Have a greater expectation on you • Have more student participation • Be fun!
Learning Objectives • Parallel Architecture: • Basic issues concerning design and likely performance of parallel systems • Specific Systems: • Will make extensive use of NCI facilities • Programming Paradigms: • Distributed and shared memory, things in between, data intensive computing • Parallel Algorithms: • Numeric and non-numeric • The Future
Commitment and Assessment • The pieces • 2 lectures per week (~30 core lecture hours) • 6 Labs (not marked, solutions provided) • 2 assignments (40%) • 1 mid-semester exam (~2 hours, 20%) • 1 final exam (3 hours, 40%) • Final mark is sum of assignment, mid-semester and final exam mark
Lectures • Two slots • Tue 14:00-16:00 Chem T2 • Thu 15:00-16:00 Chem T2 • Exact schedule on web site • Partial notes will be posted on the web site • bring copy to lecture • Attendance at lectures and labs is strongly recommended • attendance at labs will be recorded
Course Web Site http://cs.anu.edu.au/student/comp4300 We will use wattle only for lecture recordings
Laboratories • Start in week 3 (March 4th) • See web page for detailed schedule • 2 sessions available • Tue 12:00-14:00 N113 • Thu 10:00-12:00 N112 • Register via streams now • Not assessed, but will be examined
People Course Convener Alistair Rendell N226 CSIT Building Alistair.Rendell@anu.edu.au Phone 6125 4386 Lecturer Josh Milthorpe N216 CSIT Building Josh.Milthorpe@anu.edu.au Phone 6125 4478
Course Communication • Course web page cs.anu.edu.au/student/comp4300 • Bulletin board (forum – available from streams) cs.anu.edu.au/streams • At lectures and in labs • Email comp4300@cs.anu.edu.au • In person • Office hours (to be set – see web page) • Email for appointment if you want specific time
Useful Books • Principles of Parallel Programming, Calvin Lin and Lawrence Snyder, Pearson International Edition, ISBN 978-0-321-54942-6 • Introduction to Parallel Computing, 2nd Ed., Grama, Gupta, Karypis, Kumar, Addison-Wesley, ISBN 0201648652 (Electronic version accessible on line from ANU library – search for title) • Parallel Programming: techniques and applications using networked workstations and parallel computers, Barry Wilkinson and Michael Allen. Prentice Hall 2nd edition. ISBN 0131405632. • and others on web page