Multithreading algorithms

Juan Mendivelso Multithreadingalgorithms

Serial Algorithms: Suitableforrunningonanuniprocessorcomputerin whichonlyoneinstructionexecutes at a time. ParallelAlgorithms: Runon a multiprocessorcomputerthatpermitsmultipleexecutiontoexecuteconcurrently. Serial Algorithms & parallelalgorithms

Computerswithmultipleprocessingunits. • They can be: • Chip Multiprocessors: Inexpensive laptops/desktops. Theycontain a single multicoreintegrated-circuitthathousesmultipleprocessor “cores” each of whichis a full-fledgedprocessorwithaccesstocommonmemory. PARALLEL COMPUTERS

Computerswithmultipleprocessingunits. • They can be: • Clusters: Buildfrom individual computerswith a dedicatednetworksysteminterconnectingthem. Intermediateprice/performance. PARALLEL COMPUTERS

Computerswithmultipleprocessingunits. • They can be: • Supercomputers: Combination of customarchitectures and customnetworkstodeliverthehighest performance (instructions per second). Highprice. PARALLEL COMPUTERS

Althoughtherandom-access machine modelwasearlyacceptedfor serial computing, no model has beenestablishedforparallelcomputing. A majorreasonisthatvendorshavenotagreedon a single architecturalmodelforparallelcomputers. Modelsforparallelcomputing

Forexamplesomeparallelcomputersfeaturesharedmemorywhereallprocessors can accessanylocation of memory. Othersemploydistributedmemorywhereeachprocessor has a privatememory. However, thetrendappearstobetowardsharedmemorymultiprocessor. Modelsforparallelcomputing

Shared-memoryparallelcomputers use staticthreading. Software abstraction of “virtual processors” orthreadssharing a commonmemory. Eachthread can executecodeindependently. Formostapplications, threadspersistfortheduration of a computation. Staticthreading

Programming a shared-memoryparallelcomputerdirectlyusingstaticthreadsisdifficult and error prone. Dynamicallypartioningtheworkamongthethreads so thateachthreadreceivesapproximatelythesame load turnsouttobecomplicated. PROBLEMS OF STATIC THREADING

Theprogrammermust use complexcommunicationprotocolstoimplement a schedulerto load-balance thework. This has ledtothecreation of concurrencyplatforms. Theyprovide a layer of software thatcoordinates, schedules and managestheparallel-computingresources. PROBLEMS OF STATIC THREADING

Class of concurrencyplatform. Itallowsprogrammerstospecifyparallelism in applicationswithoutworryingaboutcommunicationprotocols, load balancing, etc. Theconcurrencyplatformcontains a schedulerthat load-balances thecomputationautomatically. DYNAMIC MULTITHREADING

Itsupports: • Nestedparallelism:Itallows a subroutinetobespawned, allowingthecallertoproceedwhilethespawnedsubroutineiscomputingitsresult. • Parallelloops: regular forloopsexceptthattheiterations can beexecutedconcurrently. DYNAMIC MULTITHREADING

Theuseronlyspicifiesthelogicalparallelism. Simple extension of the serial modelwith: parallel, spawn and sync. Cleanwaytoquantifyparallelism. Manymultithreadedalgorithmsinvolvingnestedparallelismfollownaturallyfromthe Divide & Conquerparadigm. ADVANTAGES OF DYNAMIC MULTITHREADING

FibonacciExample • The serial algorithm: Fib(n) • Repeatedwork • Complexity • However, recursivecalls are independent! • Parallelalgorithm: P-Fib(n) BASICS OF MULTITHREADING

Concurrencykeywords: spawn, sync and parallel Theserialization of a multithreadedalgorithmisthe serial algorithmthatresultsfromdeletingtheconcurrencykeywords. Serialization

Itoccurswhenthekeywordspawn precedes a procedurecall. Itdiffersfromtheordinaryprocedurecall in thattheprocedureinstancethatexecutesthespawn - theparent – maycontinuetoexecute in parallelwiththespawnsubroutine – itschild- instead of waitingforthechildto complete. NESTED PARALLELISM

Itdoesn’tsaythat a proceduremustexecuteconcurrentlywithitsspawnedchildren; onlythatitmay! Theconcurrencykeywordsexpressthelogicalparallelismof thecomputation. At runtime, itis up totheschedulerto determine whichsubcomputationsactuallyrunconcurrentlybyassigningthemtoprocessors. Keywordspawn

A procedurecannotsafely use thevaluesreturnedbyitsspawnedchildrenuntilafteritexecutes a syncstatement. The keyword sync indicates that the procedure must wait until all its spawned children have been completed before proceeding to the statement after the sync. Every procedure executes a sync implicitlybeforeitreturns. Keywordsync

We can see a multithreadcomputationas a directedacyclicgraph G=(V,E) called a computationaldag. Thevertices are instructions and and the edges represent dependencies between instructions, where (u,v) єE means that instruction u must execute before instruction v. Computationaldag

If a chain of instructions contains no parallel control (no spawn, sync, or return), we may group them into a single strand, each of which represents oneor more instructions. Instructionsinvolving parallel control are not included in strands, but are represented in the structure of the dag. Computationaldag

For example, if a strand has two successors, one of them must have been spawned, and a strand with multiple predecessors indicates the predecessors joined because of a sync. Thus, in the general case, the set V forms the set of strands, and the set E of directed edges represents dependencies between strands induced by parallel control. Computationaldag

If G has a directed path from strand u to strand, we say that the two strands are (logically) in series. Otherwise, strands u and are (logically) in parallel. We can picture a multithreaded computation as a dag of strands embedded in a tree of procedure instances. Example! Computationaldag

We can classify the edges: • Continuationedge :connects a strand u to its successor u’within the same procedure instance. • Call edges: representing normal procedure calls. • Return edges: When a strand u returns to its calling procedure and x is the strand immediately following the next sync in the calling procedure. • A computationstartswithaninitialstrandand endswith a single final strand. Computationaldag

A parallelcomputerthatconsists of a set of processors and a sequentialconsistentsharedmemory. Sequentialconsistentmeansthatthesharedmemorybehaves as ifthemultithreadedcomputation’sinstructionswereinterleavedto produce alinear orderthat preserves thepartialorder of thecomputationdag. IDEAL PARALLEL COMPUTER

Depending on scheduling, the ordering could differ from one run of the program to another. • The ideal-parallel-computermodel makes some performance assumptions: • Each processor in the machine has equal computing power • It ignores the cost of scheduling. IDEAL PARALLEL COMPUTER

Work: • Total time toexecutetheentirecomputationononeprocessor. • Sum of the times takenbyeach of thestrands. • In thecomputationaldag, itisthenumber of strands (assumingeachstrandtakes a time unit). PERFORMANCE MEASURES

Span: • Longest time toexecutethgestrandsalong in path in thedag. • Thespanequalsthenumber of verticeson a longestorcriticalpath. • Example! PERFORMANCE MEASURES

The actual running time of a multithreadedcomputationdependsalsoonhowmanyprocessors are availableand howtheschedulerallocatesstrandstoprocessors. Running time on P processors: TP Work: T1 Span: T∞ (unlimitednumber of processors) PERFORMANCE MEASURES

Thework and spanprovidelowerboundontherunning time of a multithreadedcomputationTP on P processors: • Worklaw:TP ≥ T1 /P • Spanlaw:TP ≥ T∞ PERFORMANCE MEASURES

Speedup: • Speedup of a computationon P processorsisthe ratio T1 /TP • Howmany times fasterthecomputationison P processorsthanononeprocessor. • It’s at most P. • Linear speedup: T1 /TP = θ(P) • Perfect linear speedup: T1 /TP =P PERFORMANCE MEASURES

Parallelism: • T1 /T∞ • Averageamountamount of work that can be performed in parallel for each step along the critical path. • As an upper bound, the parallelism gives the maximum possible speedup that can be achieved on any number of processors. • The parallelism provides a limit on the possibility of attaining perfect linear speedup. PERFORMANCE MEASURES

Good performance dependson more thanminimizingthespan and work. Thestrandsmustalsobescheduled efficiently onto the processors of the parallel machine. On multithreaded programming model provides no way to specify which strands to execute on which processors. Instead, we rely on the concurrency platform’s scheduler. SCHEDULING

A multithreaded scheduler must schedule the computation with no advance knowledge of when strands will be spawned or when they will complete—it must operate on-line. Moreover, a good scheduler operates in a distributed fashion, where the threads implementing the scheduler cooperate to load-balance the computation. SCHEDULING

To keep the analysis simple, we shall consider an on-line centralized scheduler, which knows the global state of the computation at any given time. In particular, we shall consider greedy schedulers, which assign as many strands to processors as possible in each time step. SCHEDULING

If at least P strands are ready to execute during a time step, we say that the step is a complete step, and a greedy scheduler assigns any P of the ready strands to processors. Otherwise, fewer than P strands are ready to execute, in which case we say that the step is an incomplete step, and the scheduler assigns each ready strand to its own processor. SCHEDULING

A greedyschedulerexecutes a multithreadedcomputationin time: TP ≤ T1 /P + T∞ Greedyschedulingisprovablygoodbecausesitachievesthesum of thelowerbounds as anupperbound. Besidesitiswithin a factor of 2 of optimal. SCHEDULING

Multithreading algorithms

Multithreading algorithms

Presentation Transcript

Multithreading

Multithreading

Multithreading

Multithreading Tutorial

Multithreading

Multithreading

Multithreading

Multithreading

Multithreading

Hardware Multithreading

Multithreading

Multithreading

Multithreading

Multithreading

Multithreading

Multithreading

Multithreading