310 likes | 505 Views
Frédéric Gava. Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects. 2002-2004
E N D
Frédéric Gava Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition
Background Implicit Explicit BSML Automatic parallelization skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming
Projects • 2002-2004 • ACI Grid • LIFO, LACL, PPS, INRIA • Design of parallel and Gridlibrairies for OCaml. • 2004-2007 • ACI « Young researchers » • LIFO, LACL • Production of a programming environment in which certified parallel programs can be written and safelyexecuted.
Outline • The BSML language • Multi-programming (superposition) • Implementation of the superposition • Conclusion and future works
The BSML « spirite » • Bugs grow faster than Moore’s law. (G. Berry) • High-level language lines of code number of bugd • Certified library number of bugs • Small is beautiful. (R. H. Bisseling) • BSML only use 5 primitives… • Who would drive a non-deterministic car ? (G. Berry) • Propriety of confluence of the semantic of BSML • French Proverb : « All the roads go to Roma » But the better way is to choose the shorter • One can give BSP costs to BSML programs • Different of concurrent programming : cost and confluence
The BSP model Unit of synchronization P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • pNumber of processors • rProcessors speed • LGlobal synchronization • gPhase of communication (1 word at most sent of received by each processor)
Model of execution wi Super-step i ghi L wi+1 Super-step i+1 ghi+1 L Beginning of the super-step i Local computing on each processor Global (collective) communications between processors Global synchronization : exchanged data available for the next super-step Cost(i) = (max0x<p wxi) + hig + L
Example : broadcast Direct broadcast (one super-step): BSP cost = png + L Broadcast with2 super-steps: BSP cost = 2ng + 2L
The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functionallanguage with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector
A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part
Parallel primitives of BSML • Asynchronous primitives: • Creation of a vector (creation of local values) mkpar : (int ) par • Parallel point-wize application apply : () par par par • Synchronous and communications primitives: • Communications put : (int) par (int) par • Projection of local values (to be replicated) proj : par (int)
Semantics Small-steps semantics Distributed semantics Programming model Easy for proofs (Coq) Natural semantics Easy for costs Execution model Make asynchronous steps appear Close to a real implemantation
Natural semantics • Semantics = set of axioms and inference rules • Easy to understand, makes proofs more easy • Example:
Small steps semantics Local costs • Semantics = set of rewriting rules • Using contexts for the strategy • Easier understanding of costs and errors • Example: Global cost
Distributed semantics Parts of the parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Natural • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation
Parallel composition • Several programs on the same machine • Primitive of parallelcomposition: Superposition • Divide-and-conquer BSP algorithms
Parallel Superposition • super: (unit ) (unit b) b • superE1E2 (E1 (), E2()) • Fusion of communications/synchronisations using super-threads • Keep the BSP model • Pure functional semantics
Semantics (1) • Natural semantics : • Small-step semantics: • Solution, the super-threads :
Semantics (2) • Management of the communications : • Management of the superposition :
Semantics based implementation • The semantics makes appear 3 low level primitives : • Send to send the data of the environment of communication • Rcv to received them • Wait to allow a super-thread to wait his brother • BSML primitives are thus simple calls of them (as in the small-steps semantics) • Super-threads could be implemented using threads • A scheduler of this threads is thus need for the special management of our super-threads • The environment of communications is just a Hashtable with pid of super-threads as keys
Example, prefixes calculus scan : () par par scan (+) <v0, …, vp-1> = <v0, v0+v1, …, v0+v1+…+ vp-1> scan (+) <v0, …, vm, …> = < w0 , … , wm , …> scan (+) <… ,vm+1, …, vp-1> =<…, wm+1 , … , wp+1> < w0 , … , wm , wm+wm+1, … , wm+wp+1> = <v0, v0+v1, v0+…+vm, v0+…+vm+1,…, v0+…+vp-1>
Benchmarks Direct method (BSML+MPI) D-a-C method with superposition D-a-C method with juxtaposition Time (s) Size of the polynomials
Conclusion • BSML=BSP+ML • Superposition = primitive of parallel composition • Small-step semantics of the superposition • Distributed semantics as small one • Superposition implemented using threads as in the small-step semantics
Future works • Implementation using continuation (transformation of source’s code with the help of a type checker) and proof of equivalence using our semantics • Implentation of bigger algorithms for better benchmarks of BSML and its superposition • Implementation of parallel skeletons (management of tasks) using the superposition ? • BSP model-checking of high-level Petri-nets (M-nets). The main difficult : find a non-trivial algorithm as the community of concurrent programming does. Possible but need more theoretical optimisations…
Thanks for your attention