550 likes | 680 Views
Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification. Background. Implicit. Explicit. Automatic parallelization. Skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming.
E N D
Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification
Background Implicit Explicit Automatic parallelization Skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming
Projects 2002-2004 ACI Grid 4 partners Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations 2004-2007 ACI Young researchers Production of a programming environment in which certified parallel programs can be written, proved and safely executed
Outline • Introduction • Semantics of BSML and certification • Extensions • New primitives : parallel composition & parallel IO • Library of parallel data structures • Globalized operations • Conclusion and future work
The BSP model Synchronization unit P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • p number of processors • rprocessors speed • L global synchronization • g communication phase (1 word at most sent or received by each processor)
BSP model of execution T(s) = (max0i<p wi) + hg + L
The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functional language with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector
A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part
Asynchronous primitives (mkpar f ) f0 f1 … fp-1 (f 0) (f 1) … f (p-1) apply v0 v1 … vp-1 f0 v0 f1 v1 … fp-1 vp-1 • mkpar: (int ) par • apply: () par par par
Synchronous primitives put 0 1 2 3 0 1 2 3 None Some v2 None None None None None Some v1 None None Some v5 None Some v2 None Some v3 Some v4 v0 v1 … vp-1 None Some v3 None None None Some v5 None None Some v1 Some v4 None None None None None None proj f • put: (int option) par(int option) par • proj: option par(int option) such that (f i)=vi
Outline Small steps semantics Distributed semantics Abstract machine Programming model Easy for proofs Natural semantics Easy for costs Make asynchronous steps appear Execution model Close to a real implementation
Mini language Expression of our mini language : e ::= l.e functional core language |(e e) | … | (mkpar e) parallel primitives | … | <e, e, … , e> parallel vector | (e)[s] substitution | l.e[s] closure
Natural semantics • Semantics = set of axioms and inference rules • Easy to understand, makes proofs more easy • Example: Confluent
Small steps semantics Local costs • Semantics = set of rewriting rules • Using contexts for the strategy • Easier understanding of costs and errors • Example: Global cost • Confluent (costs and values) • Equivalent to the previous semantics
Distributed semantics Parts of the Parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Small steps • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation • Confluent • Equivalent to the previous semantics
Abstract machine CAM CAM CAM PUSH SWAP PID CONS APP SEND PUSH SWAP APP PUSH SWAP PID CONS APP SEND PUSH SWAP APP PUSH SWAP PID CONS APP SEND PUSH SWAP APP BSP-CAM = p*CAM + BSP instructions (style SPMD) COMMUNICATIONS PID of the machine for mkpar Synchronous instruction for put • Minimal set of parallel instructions • Equivalence with the distributed semantics
Certification of BSML programs • The Coq Proof assistant: • Typed-calculus with dependent types • Specification = term (goal) • Language of tactics to build a proof of this goal • Extraction of the proof (certified program) • BSML and Coq : • Axiomatization of the primitive semantics in Coq • Proof of BSML programs as usual proof of ML programs • Certification and extraction of BSML programs: • Broadcast, total exchange … • Prefixes • Sort
Example: replicate Specification of replicate: intros T a. exists (mkpar T (fun pid: Z a)). rewrite mkpar_def. Certified extraction: let replicate a = mkpar (fun pid a)
Outline Parallel composition • New primitive • Divide-and-conquer • Properties Implemented with Parallel Data-structures • Simplify programming • OCaml interfaces • Load-balancing External memory (IO) • New primitives • New cost model • Property • Confluent semantics • Two equivalent semantics BSML Confluent semantics
Multiprogramming • Several programs on the same machine • New primitives for parallel composition: • Superposition • Juxtaposition (implemented with the superposition) • Divide-and-conquer BSP algorithms
Parallel superposition • super: (unit ) (unit b) b • superE1E2= (E1 (), E2()) • Fusion of communications/synchronization • Preserves the BSP model • Pure functional semantics
Parallel superposition • Confluent • BSP • Equivalence
Example: parallel prefixes Direct version (BSML+MPI) Superposition version Juxtaposition version Time(s) Size of the polynomials
Parallel data structures • Observations: • Data Structures are as important as algorithms • Symbolic computations use these data structures massively • A parallel implementation of data structures: • Interfacesas close as possible to the sequential ones • Modular implementation to get a straightforwardmaintenance • Load-balancing of the data
Parallel data structures • 5 modules: Set, Map, Stack, Queue, Hashtable • Interfaces: • Same as in OCaml • With some specific parallel functions such as parallel reductions • A parallel data structure = one data structure on each processor • Manual or Automatic load-balancing: • To get similar sizes of the local data structures • Better performances for parallel iterations • A two super-steps algorithm using histograms
Example Sequential version Parallel version (BSML+PUB) Computation of the “nth” nearest neighbors atom in a molecule : Time(s) Number of atoms
Example with load balancing Without balancing With balancing Time(s) Number of atoms
External memories Measured Predicted Time(s) Motivations : Number of elements
The EM-BSP model Disc 1 Processor Bus Disc 2 Memory Disc D P/M P/M P/M P/M P/M Network • We add to the BSP model: • D = the number of disks • B = the size of the blocs • O = latency of the disks • G = time to read/write a byte
Shared disks Disc 1 Disc 2 Disc M P/M P/M P/M P/M P/M Network • We add to the BSP model: • Shared disks • With parameters similar to those of the local disks
External memory in BSML • For safety, two kinds of files: local and global ones • New primitives to manipulate these files (IO primitives) • New semantics • Confluent • EM-BSP cost of the primitives
Modular implementation Primitives Std library Parallel data structures Comm Super IO MPI PUB TCP/IP Threads BSMLlib Lower level
Cost prediction Lists Arrays Predicted (max) Predicted (avg) Time(s) Number of elements
IO cost prediction Predicted BSML Measured BSML-IO Predicted BSML-IO Time(s) Number of elements
Outline + MSPML DMML Desynchronize BSML • Semantics • Cost models • Implementations
MSPML • Using the MPM model (parameters similar to that of BSP) • But with a different execution model: • Same language as BSML (parallel vector) but with new primitives of communication: put mget
MSPML Small steps semantics Distributed semantics Similar to BSML Programming model Easy for proofs Natural semantics Similar to BSML Easy for costs Very different Execution model Makes asynchronous steps appear
Asynchronous communications Proc. 012 0,v’’ 0,v’ Empty 0,v Local computation A bit later get v 1 request 0 1 v’ communication Environment of Communications
Asynchronous communications Proc. 012 0,v’’ 1,w’ 2,w’’ 0,v’ 0,v’ 1,w empty 0,v Not ready request 2 0
Departmental meta-computing BSML MSPML Intranet BSML BSML
Departmental Meta-computing ML • BSML+ MSPML-like for coordination • Two kinds of vectors: • parallel vector: a par • departmental vectors: a dep • Operational semantics (confluent) • Performance model (the DMM model) • Implementation
Example: departmental prefixes • Computation of the prefixes where each processor contains a value • Naive method: each processor sends its value to other processors • Better method: • Each BSP unit computes a parallel prefix • One processor of each BSP unit receives values of other units • Each BSP unit finishes its computation with this value
Experiments Better algorithm BSP algorithm (one cluster) Naive algorithm Time(s) Size of the polynomials
Conclusion • Semantics of BSML: • Confluent and equivalent semantics • Abstract machine • Proof of BSML programs • Expressivity: • Parallel composition • Parallel data structures • Parallel IO • Meta-computing: • Desynchronization of BSML (MSPML) • Departmental Meta-computing ML (DMML) • Semantics • Cost models • Implementations
Future work in the Propac project ML Program correction Coq • Cost prediction: • Static analysis of the programs • Cost prediction of certified programs • Proofs of BSP imperative programs: BSML IMP Extension with BSP operations Extension of the logical assertions