1.06k likes | 1.28k Views
Eden Parallel Functional Programming with Haskell. Rita Loogen Philipps-Universität Marburg, Germany
E N D
Eden Parallel FunctionalProgrammingwithHaskell Rita Loogen Philipps-Universität Marburg, Germany Joint workwithYolanda Ortega Mallén, Ricardo Peña Alberto de la Encina, Mercedes Hildalgo Herrero, ChristóbalPareja, Fernando Rubio, Lidia Sánchez-Gil, Clara Segura, Pablo Roldan Gomez (UniversidadComplutense de Madrid) Jost Berthold, Silvia Breitinger, Mischa Dieterle, Thomas Horstmeyer, Ulrike Klusik, Oleg Lobachev, Bernhard Pickenbrock, Steffen Priebe, Björn Struckmeier(Philipps-Universität Marburg) CEFP Budapest 2011
Marburg /Lahn Rita Loogen: Eden – CEFP 2011
Overview • Lectures I & II (Thursday) • Motivation • Basic Constructs • Case Study: Mergesort • Eden TV – The Eden Trace Viewer • Reducingcommunicationcosts • Parallel mapimplementations • Explicit Channel Management • The Remote Data Concept • Algorithmic Skeletons • NestedWorkpools • DivideandConquer • Lecture III: Lab Session (Friday Morning) • Lecture IV: Implementation • LayeredStructure • Primitive Operations • The Eden Module • The Trans class • The PA monad • Process Handling • Remote Data
Materials Materials • Lecture Notes • Slides • Example Programs (Case studies) • Exercises areprovided via the Eden web page www.informatik.uni-marburg.de/~eden Navigateto CEFP! Rita Loogen: Eden – CEFP 2011
Motivation Rita Loogen: Eden – CEFP 2011
Our Goal Parallel programmingat a highlevelofabstraction inherent parallelism • functional language (e.g. Haskell) • => concise programs • => high programming efficiency automatic parallelisation or annotations
Our Approach Parallel programmingat a highlevelofabstraction l l l l + • parallelismcontrol • explicit processes • implicitcommunication • distributedmemory • … • functional language (e.g. Haskell) • => concise programs • => high programming efficiency Eden = Haskell + Parallelism www.informatik.uni-marburg.de/~eden
Basic Constructs Rita Loogen: Eden – CEFP 2011
Eden parallel programming at a high level of abstraction • = Haskell + Coordination • processdefinition • processinstantiation process:: (Trans a, Trans b) => (a -> b) -> Process a b gridProcess = process (\ (fromLeft,fromTop) -> let ... in (toRight, toBottom)) processoutputs computedby concurrentthreads, listssentasstreams ( # ) :: (Trans a, Trans b) => Process a b -> a ->b (outEast, outSouth) = gridProcess# (inWest,inNorth)
Derivedoperatorsandfunctions • Parallel functionapplication • Often, processabstractionandinstantiationareused in thefollowingcombination • Eagerprocesscreation • Eagercreationof a seriesofprocesses ($#) :: (Trans a, Trans b) => (a -> b) -> a -> b f $# x = process f # x -- ($#) = (#) . process spawn :: (Trans a, Trans b) => [Processa b] -> [a] -> [b] spawn= zipWith (#) -- ignoringdemandcontrol spawnF :: (Trans a, Trans b) => [a -> b] -> [a] -> [b] spawnF = spawn . (mapprocess) Rita Loogen: Eden – CEFP 2011
Evaluating f $# e graphofprocessabstraction process f graphofargumentexpressione # will beevaluated in parentprocess bynewconcurrentthread andsenttochildprocess will beevaluated bynewchildprocess on remote PE 11 resultof f $ e mainprocess creates childprocess resultof e Rita Loogen: Eden – CEFP 2011
*2 *3 *5 sm sm DefiningprocessnetsExample: Computing Hammingnumbers importControl.Parallel.Eden hamming :: [Int] hamming = 1: sm ((uncurrysm) $# (map (*2)$#hamming, map (*3)$#hamming)) (map (*5)$# hamming) sm :: [Int] -> [Int] -> [Int] sm [] ys = ys smxs [] = xs sm (x:xs) (y:ys) | x < y = x : smxs (y:ys) | x == y = x : smxsys | otherwise = y : sm (x:xs) ys 1: hamming
QuestionsaboutSemantics • simple denotationalsemantics • processabstraction -> lambdaabstraction • processinstantiation -> application • value/resultofprogram, but noinformationaboutexecution, parallelismdegree, speedups /slowdowns • operational • When will a processbecreated? When will a processinstantiationbeevaluated? • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...? • When will process in-/outputsbecommunicated?
Answers Eden onlyifandwhenitsresult isdemanded normal form eager (push) communication: valuesarecommunicated assoonasavailable Lazy Evaluation (Haskell) onlyifandwhenitsresult isdemanded WHNF (weakhead normal form ) onlyifdemanded: requestandanswer messagesnecessary • When will a processbecreated? When will a processinstantiationbeevaluated? • Towhichdegree will process in-/outputsbeevaluated? Weakhead normal form or normal form or ...? • When will process in-/outputsbecommunicated?
Lazyevaluation vs. Parallelism • Problem:Lazyevaluation ==> distributedsequentiality • Eden‘sapproach: • eagerprocesscreationwithspawn • eagercommunication: • normal form evaluationof all processoutputs(byindependentthreads) • push communication, i.e. valuesarecommunicatedassoonasavailable • explicit demandcontrolbysequentialstrategies(Module Control.Seq): • rnf, rwhnf... :: Strategy a • using :: a -> Strategy a -> a • pseq :: a -> b -> b (Module Control.Parallel)
Case Study: MergeSort Rita Loogen: Eden – CEFP 2011
Case Study: MergeSort Unsorted sublist 1 Sorted sublist 1 Haskell Code: mergeSort :: (Ord a, Show a) => [a] -> [a] mergeSort [] = [] mergeSort [x] = [x] mergeSortxs = sortMerge (mergeSort xs1) (mergeSort xs2) where [xs1,xs2] = unshuffle 2xs sorted list Unsorted list split merge Unsorted sublist 2 sorted Sublist 2
Example: MergeSortparallel Unsorted sublist 1 Sorted sublist 1 Eden Code (simplestversion): parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a] parMergeSort [] = [] parMergeSort [x] = [x] parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2) where [xs1,xs2] = unshuffle 2xs sorted list Unsorted list split merge Unsorted sublist 2 sorted Sublist 2
Example: MergeSortProcessnet childprocess childprocess Eden Code (simplestversion): parMergeSort :: (Ord a, Show a, Trans a) => [a] -> [a] parMergeSort [] = [] parMergeSort [x] = [x] parMergeSortxs = sortMerge (parMergeSort$# xs1) (parMergeSort$# xs2) where [xs1,xs2] = unshuffle 2xs childprocess mainprocess childprocess childprocess childprocess
EdenTV: The Eden Trace Viewer Tool Rita Loogen: Eden – CEFP 2011
The Eden-System Eden Parallel runtimesystem (Management ofprocesses andcommunication) EdenTV parallel system
Compiling, Running, Analysing Eden Programs Set upenvironmentfor Eden on Lab computersbycalling edenenv Compile Eden programswith ghc –parmpi --make –O2 –eventlogmyprogram.hsor ghc –parpvm --make –O2 –eventlogmyprogram.hs Ifyouusepvm, youfirsthavetostart it. Providepvmhostsormpihostsfile Runcompiledprogramswith myprogram <parameters> +RTS –ls -N<noPe> -RTS Viewactivityprofile (tracefile) with edentvmyprogram_..._-N4_-RTS.parevents Rita Loogen: Eden – CEFP 2011
deblock thread new thread runnable suspend thread block thread kill thread run thread running blocked kill thread finished kill thread Eden Threads andProcesses • An Eden processcomprisesseveralthreads(one per outputchannel). • Thread State Transition Diagram:
EdenTV -Diagrams: Machines (PEs) Processes Threads - Message Overlays Machines Processes - zooming - messagestreams - additional infos - ...
EdenTV Demo Rita Loogen: Eden – CEFP 2011
Case Study: MergeSortcontinued Rita Loogen: Eden – CEFP 2011
Example: Activityprofileof parallel mergesort • Program run, lengthofinputlist: 1.000 • Observation: • SLOWDOWN • Seq. runtime: 0,0037 s • Par. runtime: 0,9472 s • Reasons: • 1999 processes, mostlyblocked • 31940 messages • delayedprocesscreation • processplacement
Howcanweimproveour parallel mergesort? Herearesomerulesofthumb. • Adaptthe total numberofprocessestothenumberofavailableprocessorelements (PEs), in Eden: noPe :: Int • UseeagerprocesscreationfunctionsspawnorspawnF. • Bydefault, Eden placesprocessesroundrobin on theavailable PEs. Try todistributeprocessesevenlyoverthe PEs. • Avoidelement-wisestreamingif not necessary, e.g. byputtingthelistintosome „box“ orbychunkingitintobiggerpieces. THINK PARALLEL! Rita Loogen: Eden – CEFP 2011
Parallel Mergesortrevisited unsorted sublist 1 sorted sublist 1 mergesort unsorted sublist 2 sorted sublist 2 mergesort sorted list unsorted list mergemanylists unshuffle (noPe-1) unsorted sublist noPe-1 sorted sublistnoPe-1 mergesort unsorted sublistnoPe sorted sublistnoPe mergesort Rita Loogen: Eden – CEFP 2011
... x1 x2 x3 x4 ... f f f f ... y1 y2 y3 y4 A Simple Parallelisationofmap map :: (a -> b) -> [a] -> [b] map f xs = [ f x | x <- xs ] parMap :: (Trans a, Trans b) => (a -> b) -> [a] -> [b] parMap f = spawn (repeat(process f)) 1 process per list element
Alternative Parallelisationofmergesort - 1st try Eden Code: par_ms :: (Ord a, Show a, Trans a) => [a] -> [a] par_msxs = head $ sms $parMapmergeSort (unshuffle (noPe-1) xs)) sms :: (NFData a, Ord a) => [[a]] -> [[a]] sms [] = [] smsxss@[xs] = xss sms (xs1:xs2:xss) = sms (sortMerge xs1 xs2) (smsxss) Total numberofprocesses = noPe eagerlycreatedprocesses roundrobinplacementleadsto 1 process per PE but maybe still toomanymessages
ResultingActivity Profile (Processes/Machine View) Previousresultsforinputsize 1000 Seq. runtime: 0,0037 s Par. runtime: 0,9472 s • Input size 1.000 • seq. runtime: 0,0037 • par. runtime: 0,0427 s • 8 Pes, 8 processes, 15 threads • 2042 messages • Much better, but still • SLOWDOWN • Reason: Indeedtoomanymessages Rita Loogen: Eden – CEFP 2011
Reducing Communication Costs Rita Loogen: Eden – CEFP 2011
ReducingNumberof Messages byChunking Streams Split a list (stream) intochunks: chunk :: Int -> [a] -> [[a]] chunksize [] = [] chunksizexs = ys : chunksizezs where (ys,zs) = splitAtsizexs Combine with parallel map-implementationofmergesort: par_ms_c :: (Ord a, Show a, Trans a) => Int -> -- chunksize [a] -> [a] par_ms_csizexs = head $ sms $mapconcat$ parMap ((chunksize) . mergeSort . concat) (map (chunksize)(unshuffle (noPe-1) xs))) Rita Loogen: Eden – CEFP 2011
ResultingActivity Profile (Processes/Machine View) Previousresultsforinputsize 1000 Seq. runtime: 0,0037 s Par. runtime I: 0,9472 s Par. runtime II: 0,0427 s • Input size 1.000, chunksize 200 • seq. runtime: 0,0037 • par. runtime: 0,0133 s • 8 Pes, 8 processes, 15 threads • 56 messages • Much better, but still • SLOWDOWN • parallel runtime w/o Startup and Finish of parallel system: • 0,0125-0,009 = 0,0035 • increaseinputsize Rita Loogen: Eden – CEFP 2011
Activity Profile for Input Size 1.000.000 • Input size 1.000.000 • Chunksize 1000 • seq. runtime: 7,287 s • par. runtime: 2,795 s • 8 Pes, 8 processes, 15 threads • 2044 messages • speedupof 2.6 on 8 PE unshuffle mapmergesort merge Rita Loogen: Eden – CEFP 2011
Further improvement Idea: Remove inputlistdistributionbylocalsublistselection: par_ms_c :: (Ord a, Show a, Trans a) => Int -> [a] -> [a] par_ms_csizexs = head $ sms $mapconcat$ parMap ((chunksize) . mergeSort . concat) (map (chunksize)(unshuffle (noPe-1) xs))) par_ms:: (Ord a, Show a, Trans a) => Int -> [a] -> [a] par_ms_bsizexs = head $ sms $mapconcat$ parMap(\ i -> (chunksize(mergeSort ((unshuffle (noPe-1) xs)!!i)))) [0..noPe-2] Rita Loogen: Eden – CEFP 2011
CorrespondingActivityProfiles • Input size 1.000.000 • Chunksize 1000 • seq. runtime: 7,287 s • par. runtime: 2,795 s • new par. runtime: 2.074 s • 8 Pes, 8 processes, 15 threads • 1036 messages • speedupof 3.5 on 8 PEs Rita Loogen: Eden – CEFP 2011
Parallel mapimplementations Rita Loogen: Eden – CEFP 2011
T T T T T T T T ... ... ... ... P P P P P P ... ... PE PE PE PE Parallel mapimplementations: parMapvsfarm parMap farm farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b)-> [a] -> [b] farmdistributecombinef xs = combine (parMap(map f) (distributexs)) parMap :: (Trans a, Trans b) => (a -> b) -> [a] -> [b] parMapf xs =spawn(repeat(process f))xs
Processfarms 1 process per sub-tasklist withstatic taskdistribution farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> -- distribute ([[b]] -> [b]) -> -- combine (a->b) -> [a] -> [b] farmdistributecombine f xs = combine . (parMap (map f)) . distribute Choose e.g. • distribute = unshufflenoPe / combine = shuffle • distribute = splitIntoNnoPe / combine = concat 1 process per PE withstatic taskdistribution Rita Loogen: Eden – CEFP 2011
Example: Functional Program for Mandelbrot Sets ul dimx Idea: parallel computation of lines lr image :: Double -> Complex Double -> Complex Double -> Integer -> String imagethresholdullrdimx = header ++ (concat$ map xy2col lines) where xy2col ::[Complex Double] -> String xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line (dimy, lines) = coordullrdimx
Example: ParallelFunctional Program for Mandelbrot Sets ul dimx Idea: parallel computation of lines lr image :: Double -> Complex Double -> Complex Double -> Integer -> String imagethresholdullrdimx = header ++ (concat$ map xy2col lines) where xy2col ::[Complex Double] -> String xy2col line = concatMap (rgb.(iterthreshold (0.0 :+ 0.0) 0)) line (dimy, lines) = coordullrdimx Replacemapby farm(unshufflenoPe) shuffle orfarmB (splitIntoNnoPe) concat
Mandelbrot Traces Problem size: 2000 x 2000 Platform: Beowulf cluster Heriot-Watt-University, Edinburgh (32 Intel P4-SMP nodes @ 3 GHz 512MB RAM, Fast Ethernet) farm (unshufflenoPe) shuffle roundrobinstatic taskdistribution farm (splitIntoNnoPe) concat roundrobinstatic taskdistribution
Camera 2D Image 3D Scene Example: Ray Tracing rayTrace:: Size -> CamPos -> [Object] -> [Impact] rayTracesizecameraPosscene = findImpactsallRaysscene whereallRays = generateRayssizecameraPos findImpacts:: [Ray] -> [Object] -> [Impact] findImpactsraysobjs = map (firstImpactobjs) rays
Reducing Communication CostsbyChunking Combine chunkingwith parallel map-implementation: chunkMap :: Int -> (([a] -> [b]) -> ([[a]] -> [[b]])) -> (a -> b) -> [a] -> [b] chunkMapsizemapscheme f xs = concat (mapscheme (map f) (chunksizexs)) Rita Loogen: Eden – CEFP 2011
RaytracerExample:Element-wise Streaming vsChunking Input size 250 Chunksize 500 Runtime: 0,235 s 8 PEs 9 processes 17 threads 48 conversations 548 messages Input size 250 Runtime: 6,311 s 8 PEs 9 processes 17 threads 48 conversations 125048 messages Rita Loogen: Eden – CEFP 2011
Communication vs Parameter Passing Processinputs- canbecommunicated: f $# inp - canbepassedasparameter(\ () -> f inp) $# () () isdummyprocessinput graphofprocessabstraction graphofinputexpression # will beevaluated in parentprocess byconcurrentthread andthensenttochildprocess will bepacked (serialised) andsentto remote PE wherechildprocessiscreated toevaluatethisexpression Rita Loogen: Eden – CEFP 2011
T T T T ... ... P P P P T...T T...T ... ... PE PE PE PE Farm vs Offline Farm Farm Offline Farm offlineFarm:: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b) -> [a] -> [b] offlineFarmdistributecombinef xs = combine$ spawn (map (rfi (map f)) (distributexs) ) (repeat ()) rfi :: (a -> b) -> a -> Process () b rfi h x = process (\ () -> h x) farm :: (Trans a, Trans b) => ([a] -> [[a]]) -> ([[b]] -> [b]) -> (a -> b)-> [a] -> [b] farmdistributecombinef xs = combine (parMap(map f) (distributexs))
RaytracerExample: Farm vs Offline Farm Input size 250 Chunksize 500 Runtime: 0,235 s 8 PEs 9 processes 17 threads 48 conversations 548 messages Input size 250 Chunksize 500 Runtime: 0,119 s 8 PEs 9 processes 17 threads 40 conversations 290 messages Rita Loogen: Eden – CEFP 2011