710 likes | 1.13k Views
Juan Mendivelso. Multithreading algorithms. Serial Algorithms : Suitable for running on an uniprocessor computer in which only one instruction executes at a time. Parallel Algorithms : Run on a multiprocessor computer that permits multiple execution to execute concurrently .
E N D
Juan Mendivelso Multithreadingalgorithms
Serial Algorithms: Suitableforrunningonanuniprocessorcomputerin whichonlyoneinstructionexecutes at a time. ParallelAlgorithms: Runon a multiprocessorcomputerthatpermitsmultipleexecutiontoexecuteconcurrently. Serial Algorithms & parallelalgorithms
Computerswithmultipleprocessingunits. • They can be: • Chip Multiprocessors: Inexpensive laptops/desktops. Theycontain a single multicoreintegrated-circuitthathousesmultipleprocessor “cores” each of whichis a full-fledgedprocessorwithaccesstocommonmemory. PARALLEL COMPUTERS
Computerswithmultipleprocessingunits. • They can be: • Clusters: Buildfrom individual computerswith a dedicatednetworksysteminterconnectingthem. Intermediateprice/performance. PARALLEL COMPUTERS
Computerswithmultipleprocessingunits. • They can be: • Supercomputers: Combination of customarchitectures and customnetworkstodeliverthehighest performance (instructions per second). Highprice. PARALLEL COMPUTERS
Althoughtherandom-access machine modelwasearlyacceptedfor serial computing, no model has beenestablishedforparallelcomputing. A majorreasonisthatvendorshavenotagreedon a single architecturalmodelforparallelcomputers. Modelsforparallelcomputing
Forexamplesomeparallelcomputersfeaturesharedmemorywhereallprocessors can accessanylocation of memory. Othersemploydistributedmemorywhereeachprocessor has a privatememory. However, thetrendappearstobetowardsharedmemorymultiprocessor. Modelsforparallelcomputing
Shared-memoryparallelcomputers use staticthreading. Software abstraction of “virtual processors” orthreadssharing a commonmemory. Eachthread can executecodeindependently. Formostapplications, threadspersistfortheduration of a computation. Staticthreading
Programming a shared-memoryparallelcomputerdirectlyusingstaticthreadsisdifficult and error prone. Dynamicallypartioningtheworkamongthethreads so thateachthreadreceivesapproximatelythesame load turnsouttobecomplicated. PROBLEMS OF STATIC THREADING
Theprogrammermust use complexcommunicationprotocolstoimplement a schedulerto load-balance thework. This has ledtothecreation of concurrencyplatforms. Theyprovide a layer of software thatcoordinates, schedules and managestheparallel-computingresources. PROBLEMS OF STATIC THREADING
Class of concurrencyplatform. Itallowsprogrammerstospecifyparallelism in applicationswithoutworryingaboutcommunicationprotocols, load balancing, etc. Theconcurrencyplatformcontains a schedulerthat load-balances thecomputationautomatically. DYNAMIC MULTITHREADING
Itsupports: • Nestedparallelism:Itallows a subroutinetobespawned, allowingthecallertoproceedwhilethespawnedsubroutineiscomputingitsresult. • Parallelloops: regular forloopsexceptthattheiterations can beexecutedconcurrently. DYNAMIC MULTITHREADING
Theuseronlyspicifiesthelogicalparallelism. Simple extension of the serial modelwith: parallel, spawn and sync. Cleanwaytoquantifyparallelism. Manymultithreadedalgorithmsinvolvingnestedparallelismfollownaturallyfromthe Divide & Conquerparadigm. ADVANTAGES OF DYNAMIC MULTITHREADING
FibonacciExample • The serial algorithm: Fib(n) • Repeatedwork • Complexity • However, recursivecalls are independent! • Parallelalgorithm: P-Fib(n) BASICS OF MULTITHREADING
Concurrencykeywords: spawn, sync and parallel Theserialization of a multithreadedalgorithmisthe serial algorithmthatresultsfromdeletingtheconcurrencykeywords. Serialization
Itoccurswhenthekeywordspawn precedes a procedurecall. Itdiffersfromtheordinaryprocedurecall in thattheprocedureinstancethatexecutesthespawn - theparent – maycontinuetoexecute in parallelwiththespawnsubroutine – itschild- instead of waitingforthechildto complete. NESTED PARALLELISM
Itdoesn’tsaythat a proceduremustexecuteconcurrentlywithitsspawnedchildren; onlythatitmay! Theconcurrencykeywordsexpressthelogicalparallelismof thecomputation. At runtime, itis up totheschedulerto determine whichsubcomputationsactuallyrunconcurrentlybyassigningthemtoprocessors. Keywordspawn
A procedurecannotsafely use thevaluesreturnedbyitsspawnedchildrenuntilafteritexecutes a syncstatement. The keyword sync indicates that the procedure must wait until all its spawned children have been completed before proceeding to the statement after the sync. Every procedure executes a sync implicitlybeforeitreturns. Keywordsync
We can see a multithreadcomputationas a directedacyclicgraph G=(V,E) called a computationaldag. Thevertices are instructions and and the edges represent dependencies between instructions, where (u,v) єE means that instruction u must execute before instruction v. Computationaldag
If a chain of instructions contains no parallel control (no spawn, sync, or return), we may group them into a single strand, each of which represents oneor more instructions. Instructionsinvolving parallel control are not included in strands, but are represented in the structure of the dag. Computationaldag
For example, if a strand has two successors, one of them must have been spawned, and a strand with multiple predecessors indicates the predecessors joined because of a sync. Thus, in the general case, the set V forms the set of strands, and the set E of directed edges represents dependencies between strands induced by parallel control. Computationaldag
If G has a directed path from strand u to strand, we say that the two strands are (logically) in series. Otherwise, strands u and are (logically) in parallel. We can picture a multithreaded computation as a dag of strands embedded in a tree of procedure instances. Example! Computationaldag
We can classify the edges: • Continuationedge :connects a strand u to its successor u’within the same procedure instance. • Call edges: representing normal procedure calls. • Return edges: When a strand u returns to its calling procedure and x is the strand immediately following the next sync in the calling procedure. • A computationstartswithaninitialstrandand endswith a single final strand. Computationaldag
A parallelcomputerthatconsists of a set of processors and a sequentialconsistentsharedmemory. Sequentialconsistentmeansthatthesharedmemorybehaves as ifthemultithreadedcomputation’sinstructionswereinterleavedto produce alinear orderthat preserves thepartialorder of thecomputationdag. IDEAL PARALLEL COMPUTER
Depending on scheduling, the ordering could differ from one run of the program to another. • The ideal-parallel-computermodel makes some performance assumptions: • Each processor in the machine has equal computing power • It ignores the cost of scheduling. IDEAL PARALLEL COMPUTER
Work: • Total time toexecutetheentirecomputationononeprocessor. • Sum of the times takenbyeach of thestrands. • In thecomputationaldag, itisthenumber of strands (assumingeachstrandtakes a time unit). PERFORMANCE MEASURES
Span: • Longest time toexecutethgestrandsalong in path in thedag. • Thespanequalsthenumber of verticeson a longestorcriticalpath. • Example! PERFORMANCE MEASURES
The actual running time of a multithreadedcomputationdependsalsoonhowmanyprocessors are availableand howtheschedulerallocatesstrandstoprocessors. Running time on P processors: TP Work: T1 Span: T∞ (unlimitednumber of processors) PERFORMANCE MEASURES
Thework and spanprovidelowerboundontherunning time of a multithreadedcomputationTP on P processors: • Worklaw:TP ≥ T1 /P • Spanlaw:TP ≥ T∞ PERFORMANCE MEASURES
Speedup: • Speedup of a computationon P processorsisthe ratio T1 /TP • Howmany times fasterthecomputationison P processorsthanononeprocessor. • It’s at most P. • Linear speedup: T1 /TP = θ(P) • Perfect linear speedup: T1 /TP =P PERFORMANCE MEASURES
Parallelism: • T1 /T∞ • Averageamountamount of work that can be performed in parallel for each step along the critical path. • As an upper bound, the parallelism gives the maximum possible speedup that can be achieved on any number of processors. • The parallelism provides a limit on the possibility of attaining perfect linear speedup. PERFORMANCE MEASURES
Good performance dependson more thanminimizingthespan and work. Thestrandsmustalsobescheduled efficiently onto the processors of the parallel machine. On multithreaded programming model provides no way to specify which strands to execute on which processors. Instead, we rely on the concurrency platform’s scheduler. SCHEDULING
A multithreaded scheduler must schedule the computation with no advance knowledge of when strands will be spawned or when they will complete—it must operate on-line. Moreover, a good scheduler operates in a distributed fashion, where the threads implementing the scheduler cooperate to load-balance the computation. SCHEDULING
To keep the analysis simple, we shall consider an on-line centralized scheduler, which knows the global state of the computation at any given time. In particular, we shall consider greedy schedulers, which assign as many strands to processors as possible in each time step. SCHEDULING
If at least P strands are ready to execute during a time step, we say that the step is a complete step, and a greedy scheduler assigns any P of the ready strands to processors. Otherwise, fewer than P strands are ready to execute, in which case we say that the step is an incomplete step, and the scheduler assigns each ready strand to its own processor. SCHEDULING
A greedyschedulerexecutes a multithreadedcomputationin time: TP ≤ T1 /P + T∞ Greedyschedulingisprovablygoodbecausesitachievesthesum of thelowerbounds as anupperbound. Besidesitiswithin a factor of 2 of optimal. SCHEDULING