280 likes | 452 Views
Frédéric Gava. Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects.
E N D
Frédéric Gava Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition
Background Implicit Explicit BSML Automatic parallelization skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming
Projects • 2002-2004 • ACI Grid • LIFO, LACL, PPS, INRIA • Design of parallel and Gridlibrairies for OCaml. • 2004-2007 • ACI « Young researchers » • LIFO, LACL • Production of a programming environment in which certified parallel programs can be written and safelyexecuted.
Outline • The BSML language • Parallel compositions • Superposition : types and semantics • Juxtaposition : types and semantics • Implementation of the juxtaposition • Conclusion and future works
The BSP model Unit of synchronization P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • pNumber of processors • rProcessors speed • LGlobal synchronization • gPhase of communication (1 word at most sent of received by each processor)
Model of execution T(s) = (max0i<p wi) + hg + L
Example : broadcast Direct broadcast: cost = png + L Broadcast with2 phases : cost = 2ng + 2L
The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functionallanguage with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector
A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part
Parallel primitives of BSML • Asynchronous primitives: • Creation of a vector mkpar : (int ) par • Parallel point-wize application apply : () par par par • Synchronous and communications primitives: • Communications put : (int option) par(int option) par • Projection of values proj : option par(int option)
Semantics Small-steps semantics Distributed semantics Programming model Easy for proofs (Coq) Natural semantics Easy for costs Execution model Make asynchronous steps appear Close to a real implemantation
Multi-programming • Several programs on the same machine • New primitives of parallelcomposition: • Superposition • Juxtaposition (implanted with the superposition) • Divide-and-conquer BSP algorithms
Parallel Superposition • super: (unit ) (unit b) b • superE1E2 (E1 (), E2()) • Fusion of communications/synchronisations using super-threads • Keep the BSP model • Pure functional semantics
Parallel juxtaposition v m-1 v’ p-1-m v 0 v 1 v i v’ 0 v’ 1 v’ j … … … … = … … … … v 0 v i v m-1 v’ 0 v’ j v’ p-1-m • juxta: int(unit par)(unit par) par • Fusion of communications/synchronisations on each sub-machine • Keep the BSP model • Side-effect on the number of processors Juxtam
Parallel juxtaposition Communications Synchronisation Communications E2 Synchronisation E1 Communications Synchronisation E3 = (juxta 3 E1 E2) Communications Communications Synchronisation Synchronisation
Distributed semantics Parts of the parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Natural • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation • Confluent • Equivalent
Use of the superposition • 2 references that contain the number of processors of a sub-machine and the real PID of the virtual processor 0 (on a sub-machine) • Creation of uncompleted vectors • Each sub-machine in a super-thread
Example, parallel prefixes Processors v op v v’ op a b op c d op e f op g h a c e g scan: () par par scan (+) <v0, …, vp-1> = <v0, v0+v1, …, v0+v1+…+vp-1>
Juxta versu Super • Code of a direct method : 12 lines • Code with superposition : 8 lines • Code with juxtaposition : 6 lines
Performances Direct method (BSML+MPI) D-a-C method with superposition D-a-C method with juxtaposition Time (s) Size of the polynomials
Conclusion • BSML=BSP+ML • Superposition = primitive of parallel composition • Juxtaposition is easier for divide-and-conquer algorithms • Distributed semantics of the juxtaposition • Juxtaposition implemented using superposition • Similar performances
Future works • Proofs of the implementation using semantics • Implentation of bigger algorithms • BSP model-checking of high-level Petri-nets (M-nets)
Thanks for your attention