320 likes | 339 Views
This course covers parallel programming basics, parallel processing concepts, and performance optimization techniques. Topics include threading, synchronization, memory hierarchies, and more. Prepare for quizzes, exams, and programming assignments.
E N D
Welcome to CSE160! Introduction to parallelcomputation
Reading • Two requiredtexts http://goo.gl/SH98DC • An Introduction to Parallel Programming, byPeterPacheco,MorganKaufmann,2011 • C++ConcurrencyinAction:PracticalMultithreading, • byAnthonyWilliams,ManningPublications,2012 • Lecture slides are no substitute for reading thetexts! • Complete the assigned readings beforeclass • readings→pre-classquizzes→ inclassproblems→exams • Pre-Class Quizzes will be on TritonEd • In-Class Quizzes: Registeryour clicker on TritonEd today!
Background • Pre-requisite: CSE100 • Comfortable with C/C++programming • If you took Operating Systems (CSE 120), you should be familiar with threads, synchronization,mutexes • If you took Computer Architecture (CSE 141) you should be familiar with memory hierarchies, including caches • We will cover these topics sufficiently to level the playingfield
CourseRequirements • 4-5Programming assignments(45%) • Multhreading with C++11+ performance programming • Assignmentsshallbedoneinteamsof2 • Potential new 5th Assignment done individually • Exams (35%) • 1 Midterm (15%) + Final(20%) • midterm = (final > midterm) ? final :midterm • On-line pre-class quizzes(10%) • Classparticipation • Respond to75%of clicker questionsandyou’ve participated in alecture • No cell phone usage unless previouslyauthorized. • Other devices may be used for note-takingonly
Policies • AcademicIntegrity • Do you ownwork • Plagiarism and cheating will not betolerated • You are required to complete an Academic Integrity ScholarshipAgreement • (part ofA0)
ProgrammingLabs • Bangcluster • Ieng6 • Make sure your accountswork (Standby for information) • Software • C++11threads • We will use Gnu4.8.4 • Extensionstudents: • Add CSE 160 to your list ofcourses • https://sdacs.ucsd.edu/~icc/exadd.php
Class presentationtechnique • I will assume that you’ve read the assigned readings beforeclass • Consider the slides as talking points, class discussions driven by yourinterest • Learning is not a passiveprocess • Class participation is important to keep the lectureactive • Different lecturemodalities • The 2 minutepause • In class problemsolving
The 2 minutepause • Opportunity in class to improve your understanding, to make sure you “got” it By trying to explain to someoneelse • Getting your mind actively working onit • Theprocess • I pose aquestion • You discuss with 1-2neighbors • Important Goal: understand why the answer iscorrect • After most seem to bedone • I’ll ask forquiet • A few will share what their group talked about • – Good answers are those where you were wrong, thenrealized… • Or ask aquestion! Please pay attention and quickly return to “lecture mode” so we can keepmoving!
Group Discussion #1 What is yourBackground? • C/C++ Java Fortran? • TLBmisses • Multithreading • MPI • RPC • C++11Async • CUDA, OpenCL,GPUs • Abstract baseclass u0 Dv0 Dt f(a)f(a)(xa)f(a)(xa)2... 1! 2!
The rest of thelecture • Introduction to parallelcomputation
What is parallel processing? • Computeon • simultaneously executing physicalresources • Improve some aspect ofperformance • Reducetimetosolution:multiplecoresarefasterthan1 • Capability:Tacklealargerproblem,moreaccurately • Multiple processor cores co-operate to process a related set of tasks – tightlycoupled • What about distributedprocessing? • Lesstightlycoupled,unreliablecommunicationand computation,changingresourceavailability • Contrast concurrency withparallelism • Correctnessisthe goal,e.g.databasetransactions • Ensurethatsharedresourcesareusedappropriately
Group Discussion#2 Have you written a parallelprogram? • Threads • C++11Async • OpenCL • CUDA • RPC • MPI
Why study parallelcomputation? • Becauseparallelismiseverywhere:cellphones, laptops, automobiles,etc. • Ifyoudon’tparallelism,youloseit! • Processorsgenerallycan’trunatpeakspeedon1core • Manyapplicationsareunderservedbecausetheyfailtouse available resourcesfully • But there are many details affectingperformance • The choice ofalgorithm • Theimplementation • Performancetradeoffs • The courses you’ve taken generally talked about how to do these things on 1 processing coreonly • Lots of changes on multiplecores
How does parallel computing relate to other branches of computerscience? • Parallel processing generalizes problems we encounter on single processorcomputers • A parallel computer is just an extension of the traditional memoryhierarchy • The need to preserve locality, which prevails in virtual memory, cache memory, and registers, also applies to a parallel computer
What you will learn in thisclass • How to solve computationally intensive problems on multicore processors effectively using threads Theory andpractice • Programming techniques,includingperformance programming • Performancetradeoffs,esp.thememoryhierarchy • CSE 160 will build on what you learned earlier in your career about programming, algorithm design andanalysis Scott B. Baden / CSE 160 /Wi '16
The age of the multi-coreprocessor • On-chip parallelcomputer • IBMPower4(2001),Intel,AMD… • First dual core laptops(2005-6) • GPUs(nVidia,ATI):desktop supercomputer • Insmartphones,behindthe dashboard • blog.laptopmag.com/nvidia-tegrak1-unveiled • Everyonehasaparallelcomputerat theirfingertips • realworldtech.com 2233
Why is parallel computationinevitable? • Physicallimitations onheatdissipation prevent further increases in clockspeed • To build a faster processor, we replicate the computationalengine http://www.neowin.net/ ChristopherDyken,SINTEF 24
The anatomy of a multi-coreprocessor • MIMD • Eachcorerunsanindependentinstructionstream • All share the globalmemory • 2 types, depends on uniformity of memory accesstimes • UMA: UniformMemoryAccesstime • AlsocalledaSymmetricMultiprocessor(SMP) • NUMA:Non-UniformMemoryAccesstime 1/5/16 25
Multithreading • Howdoweexplainhowtheprogramrunsonthehardware? • Onsharedmemory,a naturalprogrammingmodeliscalled • multithreading • Programsexecuteasasetofthreads • Threads are usually assigned to different physicalcores • Each thread runs the same code as an independent instructionstream • Same Program Multiple Data programming model =“SPMD” • Threadscommunicateimplicitlythroughsharedmemory(e.g. the heap),buthavetheirownprivatestacks • They coordinate(synchronize) via sharedvariables
What is athread? • A thread is similar to a procedure call with notabledifferences • The control flowchanges • Aprocedurecallis“synchronous;”returnindicates completion • Aspawnedthreadexecutesasynchronouslyuntilit completes, and hence a return doesn’t indicate completion • A new storage class: shareddata • Synchronizationmaybeneededwhenupdatingshared state (threadsafety) Shared memory s =... s y = ..s... Private memory i:8 i:5 i:2 P0 P1 Pn
Whichofthesestorageclassescanneverbe shared amongthreads? Globals declared outside anyfunction Local automaticstorage Heapstorage Class members(variables) B &C
Whythreads? • Processesare“heavyweight”objectsscheduledbytheOS • Protected address space, open files, and otherstate • AthreadAKAalightweightprocess(LWP) • Threads share the address space and open files of the parent, but have their ownstack • Reduced management overheads, e.g. threadcreation • Kernel scheduler multiplexesthreads heap stack stack . . . P P P
Parallel controlflow • Parallelprogram • Startwith a singlerootthread Fork-join parallelismtocreate concurrentlyexecutingthreads • Threadscommunicateviasharedmemory • A spawned thread executes asynchronously until itcompletes • Threads may or may not execute on differentprocessors Heap(shared) stack Stack (private) . . . P P P
Whatformsofcontrolflowdowehave ina serialprogram? FunctionCall Iteration Conditionals(if-then-else) Switchstatements All of theabove
Multithreading inPractice • • C++11 • POSIX Threads “standard” (pthreads): IEEE POSIX1003.1c-1995 • Low levelinterface • Beware of non-standardfeatures • OpenMP – programannotations • Java threads not used in high performance computation • Parallel programminglanguages • Co-arrayFORTRAN • UPC
C++11Threads • Via <thread>, C++ supports a threading interface similar to pthreads, though a bit more userfriendly • Async is a higher level interface suitable for certain kinds ofapplications • New memorymodel • Atomictemplate • Requires C++11 compliant compiler, gnu 4.7+,etc.
Hello world with<Threads> #include <thread> void Hello(int TID){ cout << "Hello from thread "<< } $ ./hello_th3 Hello from thread 0 Hello from thread1 Hello from thread2 $ ./hello_th3 Hello from thread 1 Hello from thread 0 Hello from thread2 $ ./hello_th4 Running with 4 threads Hello from thread0 Hello from thread3 Hello from thread Hello from thread21 TID << endl; int main(int argc, char *argv[]){ thread *thrds = newthread[NT]; // Spawnthreads for(int t=0;t<NT;t++){ thrds[t] = thread(Hello, t); } // Jointhreads for(int t=0;t<NT;t++) thrds[t].join(); }
Stepsinwritingmultithreaded code • Wewriteathreadfunctionthatgetscalledeachtimewe spawn a newthread • SpawnthreadsbyconstructingobjectsofclassThread (in the C++library) • Eachthreadrunsonaseparateprocessingcore • (Ifmorethreadsthancores,the threadssharecores) • Threadssharememory,declaresharedvariablesoutsidethe scope of anyfunctions • Divideupthe computationfairlyamongthethreads • Jointhreadssoweknowwhentheyaredone
Summary of today’slecture • Thegoalofparallelprocessingistoimprovesomeaspectof performance • Themulticoreprocessor has multipleprocessingcores sharingmemory,the consequenceoftechnologicalfactors • We willemploymultithreading in thiscourseto “parallelize”applications • We will use the C++threadslibrary tomanage multhreading
NextTime • Multithreading • Be sure your clicker isregistered • By Friday at 6pm: do Assignment#0 • cseweb.ucsd.edu/classes/wi17/cse160-a/HW/A0.html • Establish that you can login to bang andieng6 • cseweb.ucsd.edu/classes/wi17/cse160-a/lab.html