1 / 55

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification. Background. Implicit. Explicit. Automatic parallelization. Skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming.

Download Presentation

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification

  2. Background Implicit Explicit Automatic parallelization Skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming

  3. Projects 2002-2004 ACI Grid 4 partners Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations 2004-2007 ACI Young researchers Production of a programming environment in which certified parallel programs can be written, proved and safely executed

  4. Outline • Introduction • Semantics of BSML and certification • Extensions • New primitives : parallel composition & parallel IO • Library of parallel data structures • Globalized operations • Conclusion and future work

  5. Introduction

  6. The BSP model Synchronization unit P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • p number of processors • rprocessors speed • L global synchronization • g communication phase (1 word at most sent or received by each processor)

  7. BSP model of execution T(s) = (max0i<p wi) + hg + L

  8. The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functional language with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector

  9. A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part

  10. Asynchronous primitives (mkpar f ) f0 f1 … fp-1 (f 0) (f 1) … f (p-1) apply v0 v1 … vp-1 f0 v0 f1 v1 … fp-1 vp-1 • mkpar: (int  )   par • apply: () par  par  par

  11. Synchronous primitives put 0 1 2 3 0 1 2 3 None Some v2 None None None None None Some v1 None None Some v5 None Some v2 None Some v3 Some v4 v0 v1 … vp-1 None Some v3 None None None Some v5 None None Some v1 Some v4 None None None None None None proj f • put: (int option) par(int option) par • proj:  option par(int option) such that (f i)=vi

  12. Semantics and certification

  13. Outline Small steps semantics Distributed semantics Abstract machine Programming model Easy for proofs Natural semantics Easy for costs Make asynchronous steps appear Execution model Close to a real implementation

  14. Mini language Expression of our mini language : e ::= l.e functional core language |(e e) | … | (mkpar e) parallel primitives | … | <e, e, … , e> parallel vector | (e)[s] substitution | l.e[s] closure

  15. Natural semantics • Semantics = set of axioms and inference rules • Easy to understand, makes proofs more easy • Example: Confluent

  16. Small steps semantics Local costs • Semantics = set of rewriting rules • Using contexts for the strategy • Easier understanding of costs and errors • Example: Global cost • Confluent (costs and values) • Equivalent to the previous semantics

  17. Distributed semantics Parts of the Parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Small steps • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation • Confluent • Equivalent to the previous semantics

  18. Abstract machine CAM CAM CAM PUSH SWAP PID CONS APP SEND PUSH SWAP APP PUSH SWAP PID CONS APP SEND PUSH SWAP APP PUSH SWAP PID CONS APP SEND PUSH SWAP APP BSP-CAM = p*CAM + BSP instructions (style SPMD) COMMUNICATIONS PID of the machine for mkpar Synchronous instruction for put • Minimal set of parallel instructions • Equivalence with the distributed semantics

  19. Certification of BSML programs • The Coq Proof assistant: • Typed-calculus with dependent types • Specification = term (goal) • Language of tactics to build a proof of this goal • Extraction of the proof (certified program) • BSML and Coq : • Axiomatization of the primitive semantics in Coq • Proof of BSML programs as usual proof of ML programs • Certification and extraction of BSML programs: • Broadcast, total exchange … • Prefixes • Sort

  20. Example: replicate Specification of replicate: intros T a. exists (mkpar T (fun pid: Z  a)). rewrite mkpar_def. Certified extraction: let replicate a = mkpar (fun pid  a)

  21. Extensions and parallel data structures

  22. Outline Parallel composition • New primitive • Divide-and-conquer • Properties Implemented with Parallel Data-structures • Simplify programming • OCaml interfaces • Load-balancing External memory (IO) • New primitives • New cost model • Property • Confluent semantics • Two equivalent semantics BSML Confluent semantics

  23. Multiprogramming • Several programs on the same machine • New primitives for parallel composition: • Superposition • Juxtaposition (implemented with the superposition) • Divide-and-conquer BSP algorithms

  24. Parallel superposition • super: (unit ) (unit  b)   b • superE1E2= (E1 (), E2()) • Fusion of communications/synchronization • Preserves the BSP model • Pure functional semantics

  25. Parallel superposition • Confluent • BSP • Equivalence

  26. Example: parallel prefixes Direct version (BSML+MPI) Superposition version Juxtaposition version Time(s) Size of the polynomials

  27. Parallel data structures • Observations: • Data Structures are as important as algorithms • Symbolic computations use these data structures massively • A parallel implementation of data structures: • Interfacesas close as possible to the sequential ones • Modular implementation to get a straightforwardmaintenance • Load-balancing of the data

  28. Parallel data structures • 5 modules: Set, Map, Stack, Queue, Hashtable • Interfaces: • Same as in OCaml • With some specific parallel functions such as parallel reductions • A parallel data structure = one data structure on each processor • Manual or Automatic load-balancing: • To get similar sizes of the local data structures • Better performances for parallel iterations • A two super-steps algorithm using histograms

  29. Example Sequential version Parallel version (BSML+PUB) Computation of the “nth” nearest neighbors atom in a molecule : Time(s) Number of atoms

  30. Example with load balancing Without balancing With balancing Time(s) Number of atoms

  31. External memories Measured Predicted Time(s) Motivations : Number of elements

  32. The EM-BSP model Disc 1 Processor Bus Disc 2 Memory Disc D P/M P/M P/M P/M P/M Network • We add to the BSP model: • D = the number of disks • B = the size of the blocs • O = latency of the disks • G = time to read/write a byte

  33. Shared disks Disc 1 Disc 2 Disc M P/M P/M P/M P/M P/M Network • We add to the BSP model: • Shared disks • With parameters similar to those of the local disks

  34. External memory in BSML • For safety, two kinds of files: local and global ones • New primitives to manipulate these files (IO primitives) • New semantics • Confluent • EM-BSP cost of the primitives

  35. Modular implementation Primitives Std library Parallel data structures Comm Super IO MPI PUB TCP/IP Threads BSMLlib Lower level

  36. Cost prediction Lists Arrays Predicted (max) Predicted (avg) Time(s) Number of elements

  37. IO cost prediction Predicted BSML Measured BSML-IO Predicted BSML-IO Time(s) Number of elements

  38. Globalized operations

  39. Outline + MSPML DMML Desynchronize BSML • Semantics • Cost models • Implementations

  40. MSPML • Using the MPM model (parameters similar to that of BSP) • But with a different execution model: • Same language as BSML (parallel vector) but with new primitives of communication: put  mget

  41. MSPML Small steps semantics Distributed semantics Similar to BSML Programming model Easy for proofs Natural semantics Similar to BSML Easy for costs Very different Execution model Makes asynchronous steps appear

  42. Asynchronous communications Proc. 012 0,v’’ 0,v’ Empty 0,v Local computation A bit later get v 1 request 0 1 v’ communication Environment of Communications

  43. Asynchronous communications Proc. 012 0,v’’ 1,w’ 2,w’’ 0,v’ 0,v’ 1,w empty 0,v Not ready request 2 0

  44. Departmental meta-computing BSML MSPML Intranet BSML BSML

  45. Departmental Meta-computing ML • BSML+ MSPML-like for coordination • Two kinds of vectors: • parallel vector: a par • departmental vectors: a dep • Operational semantics (confluent) • Performance model (the DMM model) • Implementation

  46. Example: departmental prefixes • Computation of the prefixes where each processor contains a value • Naive method: each processor sends its value to other processors • Better method: • Each BSP unit computes a parallel prefix • One processor of each BSP unit receives values of other units • Each BSP unit finishes its computation with this value

  47. Experiments Better algorithm BSP algorithm (one cluster) Naive algorithm Time(s) Size of the polynomials

  48. Conclusion and future work

  49. Conclusion • Semantics of BSML: • Confluent and equivalent semantics • Abstract machine • Proof of BSML programs • Expressivity: • Parallel composition • Parallel data structures • Parallel IO • Meta-computing: • Desynchronization of BSML (MSPML) • Departmental Meta-computing ML (DMML) • Semantics • Cost models • Implementations

  50. Future work in the Propac project ML Program correction Coq • Cost prediction: • Static analysis of the programs • Cost prediction of certified programs • Proofs of BSP imperative programs: BSML IMP Extension with BSP operations Extension of the logical assertions

More Related