290 likes | 425 Views
ME964 High Performance Computing for Engineering Applications. Fall 2008 Dan Negrut Assistant Professor Department of Mechanical Engineering University of Wisconsin, Madison. Instructor: Dan Negrut. Polytechnic Institute of Bucharest, Romania B.S. – Aerospace Engineering (1992)
E N D
ME964High Performance Computing for Engineering Applications Fall 2008 Dan NegrutAssistant ProfessorDepartment of Mechanical EngineeringUniversity of Wisconsin, Madison
Instructor: Dan Negrut • Polytechnic Institute of Bucharest, Romania • B.S. – Aerospace Engineering (1992) • The University of Iowa • Ph.D. – Mechanical Engineering (1998) • MSC.Software • Product Development Engineer 1998-2005 • The University of Michigan • Adjunct Assistant Professor, Dept. of Mathematics (2004) • Division of Mathematics and Computer Science, Argonne National Laboratory • Visiting Scientist 2004-2005, 2006 • The University of Wisconsin-Madison, Joined in Nov. 2005 • Research: Computer Aided Engineering (Simulation-Based Engineering Lab) • Focus: Computational Dynamics (Dynamics of Multi-body Systems) 2
Good to know… • Time 9:30 Tu & Th • Location 2106ME • Office 2035ME • Phone 608 890-0914 • E-Mail negrut@engr.wisc.edu • Course Webpage http://sbel.wisc.edu/Courses/ME964/2008/index.htm • GradersNick Schafer (npschafer@wisc.edu), David Dynerman (dmdynerman@wisc.edu) • Grades reported at:learnuw.wisc.edu • ME964 Forum:http://sbel.wisc.edu/Forum/index.php?board=3.0 3
ME 964 Fall 2008 • Office Hours: • Monday 2 – 4:30 PM • Wednesday 2 – 4:30 PM • Call or email to arrange for meetings outside office hours • Walk-ins are fine as long as they are in the afternoon 4
Text • No textbook is required, but there are some highly recommended ones: • NVIDIA, NVIDIA CUDA Programming Guide V2.0, 2008 • Download from NVIDIA website • B. Kernighan and D. Ritchie, The C Programming Language • B. Stroustrup, The C++ Programming Language, Third Edition • H. Nguyen (ed.), GPU Gems 3, Addison Wesley, 2007 • Very good • If you are interested in GPU computing and can afford it ($60@Amazon) I highly recommend the book • M. Pharr (ed.), GPU Gems 2, Addison Wesley, 2005 • T. Mattson, et al. “Patterns for Parallel Programming,” Addison Wesley, 2005 • Last three textbooks placed on reserve at Wendt Library 5
Course Related Information • Handouts will be printed out and provided before each lecture • Lecture slides will be made available online • I’m trying to also have mp3 files posted • Grades will be maintained online at Learn@UW • Syllabus will be updated as we go • It will contain info about • Topics we cover • Homework assignments • It will be available at the course website • http://sbel.wisc.edu/Courses/ME964/2008/index.htm 6
Grading • Homework 30% • Midterm Exam 10% • Midterm Project 20% • Final Project 35% • Course Participation 5% • Total 100% NOTE: • HW & Project & Exam scores will be maintained on the course website (Learn@UW) • Score related questions (homeworks/exams/labs) must be raised prior to next class after the homeworks/exams/lab is returned. 7
Homework Policies • About seven HWs assigned • Late HW acceptable under certain circumstances • Two late HWs are OK • Email us at me964uw@gmail.com to inform about late submission • IMPORTANT: request extension prior to deadline • Use message subject “YourFirstName YourLastName ME964: Late HW Notice” • HW due at 11:59 PM on the day indicated • Homework with lowest score will be dropped 8
Midterm Exam • One midterm exam • Scheduled during regular class hours • Tentatively scheduled during mid November • Doesn’t require use of a computer (it’s a pen and paper exam) 9
Midterm Project • Tentatively assigned in mid October • You’ll have three weeks to work on this project • Accounts for 20% of final grade • Same problem assigned to everybody • Students coming up with best three solutions encouraged to write conference paper as a three author paper • If submission is accepted, I will cover for airfare and registration fee for the conference participation of one student • Ranking will be based on compute time and originality of the design 10
Final Exam Project • In the Academic Calendar is scheduled for December 14, 2:45 PM • Two hour time slot used to have Final Project presentations • NOTE: • The Final Project presentation is scheduled for Dec. 14 • The Final Project is due on the last day of finals (Dec.20), at 11:59 PM • Final Project (accounts for 35% of final grade): • It is an individual project • You choose a problem that suites your research or interests • You are *very* much encouraged to tackle a meaningful problem • Attempt to solve a useful problem rather than a problem that you are confident that you can solve • Projects that are not successful are ok, provided you aim high enough and demonstrate good work • Tentatively, • Work on Final Project will start at the end of October • Presentation of topic and initial solution design tentatively scheduled for mid November 11
Class Participation • For registered students: • Accounts for 5% of final grade. To earn the 5%, you must: • Have at least five post on the NVIDIA’s CUDA Forum • Register there with the following ID: ??me964uw (replace first two characters with your initials (first and last, upper case)) • Contribute at least five meaningful posts on the class bulletin board • Bulletin board (BB) is live at: http://sbel.wisc.edu/Forum/index.php?board=3.0 • BB meant to serve as a quick way to answer some of your questions by instructor and other ME964 colleagues • Your ME964 BB account is already set up, use the same ??me964uw ID • The temporary password is ME964 (please make sure you reset your password to avoid any surprises) 12
Class Participation • For sit-in students • Notes will be posted at http://sbel.wisc.edu/Courses/ME964/2008/index.htm • I’ll also make a good faith attempt to post audio mp3 at http://sbel.wisc.edu/Courses/ME964/2008/audio.htm • You are encouraged to complete the assigned HW • I would be happy to grade your HW as well • No formal registration for sit-in required 13
Scores and Grades ScoreGrade 92-100 A 86-91 AB 78-85 B 70-77 BC 60-69 C 50-59 D Rounding: • Done to the nearest integer 14
Rules of Engagement • You are encouraged to discuss assignments with other students in the class • Post and read posts on BB • Getting verbal advice and suggestions from anybody is fine • Any copying of non-trivial code is not acceptable • Non-trivial = more than a line or so • Includes reading someone else’s code and then going off to write your own. • Use of third party libraries that directly implement the solution of a HW/Project is not acceptable • Comparing your code to third party code to gauge the efficiency of your implementation is strongly encouraged though 15
Rules of Engagement • Breaking the rules: • Zero points on HW/Exam/Project at first occurrence • Automatic F final grade upon second occurrence • These rules are vague and not meant to police you • I count on your honesty more than anything else 16
Computer Lab for ME964 • There is only one lab on-campus that has hardware required to do your homework • Lab 1235ME • NOTE: remote login is not going to work (even if you “push through” the desktop of one of the machines in the lab) • The Lab has NVIDIA GPUs • GeForce 8800 GT • 20 machines, running Windows XP • For those who really prefer to use Linux: • I’m investigating the possibility of having accounts at the supercomputing center at U of Illinois at Urbana/Champaign 17
Operating System Related… • I strongly encourage you to use WinXP Professional • All machines in 1235ME lab are running WinXP • All assignments will have a DevStudio project that contains the vast majority of the support files needed to implement the solution • Superior debugger, integrated development environment • Linux Makefiles will be provided but won’t be supported • Unfortunately, there will be no machine in the lab supporting Linux – you will be on your own • I might be able to lend you a GPU in case you have a Linux box and really want to use it 18
Course Objectives • Introduce student to existing High-Performance Computing (HPC) software and hardware • Usually “high-performance” refers to parallel architectures or vector machines; i.e., architectures that have the potential to run much faster compared to your desktop computer • Introduce student to HPC on the Graphics Processing Unit (GPU) • GPU computing typically associated with fine grain parallelism • Three lectures will be dedicated to the Message Passing Interface (MPI) HPC model, which is aimed at coarse grain parallelism • Present basic software design patterns for parallel computing 19
Overview of Material Covered • Quick C Intro • General considerations vis-à-vis trends in the chip industry • Overview of parallel computation paradigms and supporting hardware/software • GPU computing and the CUDA programming model • Intro to MPI programming • Guest Lectures • Midterm/Final Project related discussions 20
Overview of the GPU (CUDA) component… • GPU Computing and CUDA Intro • CUDA Memory Model • CUDA Hardware • GPU Compute Core • Bank Conflicts • Control Flow in CUDA • Parallel Programming - Application Performance • Parallel Programming - Algorithm Styles 21
Guest Lectures • This is a high-level graduate class on a very dynamic and timely topic • I intend to invite specialists in the HPC field to deliver guest lectures • Tentative Guest Lectures • Michael Garland – Senior Researcher, NVIDIA • Mark Hill – Professor CS and ECE, UW • Karu Sankaralingam – Assistant Professor, UW • Brent Oster – Researcher, NVIDIA • Darius Buntinas – Scientist, Argonne National Lab • Two longer lectures on November 11 and 13, will count as three regular lectures and run between 9:30 to 11:30 AM • Somebody from CS at UW, speaking on the Condor project 22
At the beginning of the road • Teaching the class for the first time • There will be rough edges • There might be questions that I don’t have an answer for • I promise to get back with an answer within one week • Please ask questions 23
Very quick overview of C • Read chapter 5 of “The C Programming Language” (Kernighan and Ritchie) • Sections 5.1-5.3 and 5.5-5.9 are particularly important • This will show up time and again (including in the midterm exam) • Acknowledgements: • Slides on this C Intro use material due to Donghui Zhang and Lewis Girod 25
C Syntax and Hello World #include inserts another file. “.h” files are called “header” files. They contain declarations/definitions needed to interface to libraries and code in other “.c” files. What do the < > mean? This is a comment, ignored by the compiler #include <stdio.h> /* The simplest C Program */ int main(int argc, char **argv) { printf(“Hello World\n”); return 0; } The main() function is always where your program starts running. Blocks of code (“lexical scopes”) are marked by { … } Return ‘0’ from this function 26
Lexical Scoping Every Variable is Defined within some scope. A Variable cannot be referenced by name (a.k.a. Symbol) from outside of that scope. void p(char x){ /* p,x */ char y; /* p,x,y */ char z; /* p,x,y,z */ } /* p */ char z; /* p,z */ void q(char a){ char b; /* p,z,q,a,b */ { char c; /* p,z,q,a,b,c */ } char d; /* p,z,q,a,b,d(not c) */ } /* p,z,q */ Lexical scopes are defined with curly braces { }. The scope of Function Arguments is the complete body of that function. The scope of Variables defined inside a function starts at the definition and ends at the closing brace of the containing block char b? legal? The scope of Variables defined outside a function starts at the definition and ends at the end of the file. Called “Global” Vars. 27
Comparison and Mathematical Operators The rules of precedence are clearly defined but often difficult to remember or non-intuitive. When in doubt, add parentheses to make it explicit. == equal to < less than <= less than or equal > greater than >= greater than or equal != not equal && logical and || logical or ! logical not • + plus • minus • * mult • / divide • % modulo & bitwise and | bitwise or ^ bitwise xor ~ bitwise not << shift left >> shift right • Beware division: • If second argument is integer, the • result will be integer (rounded): • 5 / 10 0 whereas 5 / 10.0 0.5 • Division by 0 will cause a FPE Don’t confuse & and &&.. 1 & 2 0 whereas 1 && 2 <true> 28
Assignment Operators x = y assign y to x x++ post-increment x ++x pre-increment x x-- post-decrement x --x pre-decrement x x += y assign (x+y) to x x -= y assign (x-y) to x x *= y assign (x*y) to x x /= y assign (x/y) to x x %= y assign (x%y) to x Note the difference between ++x and x++: int x=5; int y; y = ++x; /* x == 6, y == 6 */ int x=5; int y; y = x++; /* x == 6, y == 5 */ Don’t confuse “=“ and “==“! int x=5; if (x==6) /* false */ { /* ... */ } /* x is still 5 */ int x=5; if (x=6) /* always true */ { /* x is now 6 */ } /* ... */ 29