1.06k likes | 1.21k Views
A Strong Programming Model Bridging Distributed and Multi-Core Computing. Denis.Caromel@inria.fr. Background: INRIA, Univ. Nice, OASIS Team Programming : Parallel Programming Models: Asynchronous Active Objects, Futures, Typed Groups High-Level Abstractions (OO SPMD, Comp., Skeleton)
E N D
A Strong Programming Model Bridging Distributed and Multi-Core Computing Denis.Caromel@inria.fr • Background: INRIA, Univ. Nice, OASIS Team • Programming:Parallel Programming Models: • Asynchronous Active Objects, Futures, Typed Groups • High-Level Abstractions (OO SPMD, Comp., Skeleton) • Optimizing • Deploying Denis Caromel
1. Background and Team at INRIA Denis Caromel
Computer Science and Control 8 Centers all over France Workforce: 3 800 Strong in standardization committees: IETF, W3C, ETSI, … Strong Industrial Partnerships Foster company foundation: 90 startups so far - Ilog (Nasdaq, Euronext) - … - ActiveEon OASIS Team & INRIA • A joint team between: INRIA, Nice Univ. CNRS • Participation to EU projects: • CoreGrid, EchoGrid, Bionets, SOA4ALL • GridCOMP (Scientific Coordinator) • ProActive 4.0.1: Distributed and Parallel: • From Multi-cores to Enterprise+Science GRIDs Denis Caromel
Startup Company Born of INRIA • Co-developing, Providing support for Open Source ProActive Parallel Suite • Worldwide Customers (EU, Boston USA, etc.) Denis Caromel
OASIS Team Composition (35) • PostDoc (1): • Regis Gascon (INRIA) • Engineers (10): • Elaine Isnard (AGOS) • Fabien Viale (ANR OMD2, Renault ) • Franca Perrina (AGOS) • Germain Sigety (INRIA) • Yu Feng (ETSI, FP6 EchoGrid) • Bastien Sauvan (ADT Galaxy) • Florin-Alexandru.Bratu (INRIA CPER) • Igor Smirnov (Microsoft) • Fabrice Fontenoy (AGOS) • Open position (Thales) • Trainee (2): • Etienne Vallette d’Osia (Master 2 ISI) • Laurent Vanni (Master 2 ISI) • Assistants (2): • Patricia Maleyran (INRIA) • Sandra Devauchelle (I3S) • Researchers (5): • D. Caromel (UNSA, Det. INRIA) • E. Madelaine (INRIA) • F. Baude (UNSA) • F. Huet (UNSA) • L. Henrio (CNRS) • PhDs (11): • Antonio Cansado (INRIA, Conicyt) • Brian Amedro (SCS-Agos) • Cristian Ruz (INRIA, Conicyt) • Elton Mathias (INRIA-Cordi) • Imen Filali (SCS-Agos / FP7 SOA4All) • Marcela Rivera (INRIA, Conicyt) • Muhammad Khan (STIC-Asia) • Paul Naoumenko (INRIA/Région PACA) • Viet Dung Doan (FP6 Bionets) • Virginie Contes (SOA4ALL) • Guilherme Pezzi (AGOS, CIFRE SCP) • + Visitors + Interns Denis Caromel
ProActive Contributors Denis Caromel
ProActive Parallel Suite:Architecture Denis Caromel
ProActive Parallel Suite Physical Infrastructure 9 Denis Caromel
ProActive Parallel Suite 10 Denis Caromel
2. Programming Modelsfor Parallel & Distributed Denis Caromel
ProActive Parallel Suite Denis Caromel
ProActive Parallel Suite Denis Caromel
Distributed and ParallelActive Objects 14 Denis Caromel
ProActive : Active objects JVM A A WBN! A ag =newActive (“A”, […], VirtualNode) V v1 = ag.foo (param); V v2 = ag.bar (param); ... v1.bar(); //Wait-By-Necessity JVM ag v2 v1 V Wait-By-Necessity is a Dataflow Synchronization Java Object Active Object Req. Queue Future Object Proxy Thread Request 15 Denis Caromel
First-Class FuturesUpdate 16 Denis Caromel
Wait-By-Necessity: First Class Futures V= b.bar () b v v Futures are Global Single-Assignment Variables a b c.gee (V) c c 17 Denis Caromel
Wait-By-Necessity: Eager Forward Based V= b.bar () b v v AO forwarding a future: will have to forward its value a b c.gee (V) c c 18 Denis Caromel
Wait-By-Necessity: Eager Message Based V= b.bar () b v v AO receiving a future: send a message a b c.gee (V) c c 19 Denis Caromel
Standard system at Runtime:No Sharing NoC: Network On Chip Proofs of Determinism 20 Denis Caromel
(2) ASP: Asynchronous Sequential Processes • ASP Confluence and Determinacy • Future updates can occur at any time • Execution characterized by the order of request senders • Determinacy of programs communicating over trees, … • A strong guide for implementation, • Fault-Tolerance and checkpointing, Model-Checking, … Denis Caromel
No Sharing even for Multi-CoresRelated Talks at PDP 2009 • SS6 Session, today at 16:00 • Impact of the Memory Hierarchy on Shared Memory Architectures • in Multicore Programming Models • Rosa M. Badia, Josep M. Perez, Eduard Ayguade, and Jesus Labarta • Realities of Multi-Core CPU Chips and Memory Contention • David P. Barker Denis Caromel
TYPED ASYNCHRONOUS GROUPS 23 Denis Caromel
Creating AO and Groups A V A ag =newActiveGroup (“A”, […], VirtualNode) V v = ag.foo(param); ... v.bar(); //Wait-by-necessity JVM Group, Type, and Asynchrony are crucial for Composition Typed Group Java or Active Object 24 Denis Caromel
Broadcast and Scatter ag JVM c3 c3 c3 c3 c3 c3 c3 c1 c1 c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c2 c2 JVM JVM s s JVM • Broadcast is the default behavior • Use a group as parameter, Scattered depends on rankings cg ag.bar(cg); // broadcast cg ProActive.setScatterGroup(cg); ag.bar(cg); // scatter cg 25 Denis Caromel
Static Dispatch Group c4 c4 c4 c6 c6 c6 c5 c5 c5 c7 c7 c7 c8 c8 c8 c0 c0 c0 c9 c9 c9 empty queue c3 c3 c3 c1 c1 c1 c2 c2 c2 JVM Slowest ag cg JVM Fastest JVM ag.bar(cg); JVM 26 Denis Caromel
Dynamic Dispatch Group c4 c4 c4 c6 c6 c6 c5 c5 c5 c7 c7 c7 c8 c8 c8 c0 c0 c0 c9 c9 c9 c3 c3 c3 c1 c1 c1 c2 c2 c2 JVM Slowest ag cg JVM Fastest JVM ag.bar(cg); JVM 27 Denis Caromel
Handling Group Failures (2) Except. List ag vg JVM JVM JVM Except. failure JVM V vg = ag.foo (param); Group groupV = PAG.getGroup(vg); el = groupV.getExceptionList(); ... vg.gee(); 28 Denis Caromel
Abstractions for ParallelismThe right Tool to execute the Task Denis Caromel
Object-Oriented SPMD 30 Denis Caromel
OO SPMD A A ag = newSPMDGroup (“A”, […], VirtualNode) // In each member myGroup.barrier (“2D”); // Global Barrier myGroup.barrier (“vertical”); // Any Barrier myGroup.barrier (“north”,”south”,“east”,“west”); Still, not based on raw messages, but Typed Method Calls ==> Components 31 Denis Caromel
Object-Oriented SPMDSingle Program Multiple Data Motivation Use Enterprise technology (Java, Eclipse, etc.) for Parallel Computing Able to express in Java MPI’s Collective Communications: broadcast reduce scatter allscatter gather allgather Together with Barriers, Topologies. 32 Denis Caromel
MPI Communication primitives For some (historical) reasons, MPI has many com. Primitives: MPI_Send Std MPI_Recv Receive MPI_Ssend Synchronous MPI_Irecv Immediate MPI_Bsend Buffer … (any) source, (any) tag, MPI_Rsend Ready MPI_Isend Immediate, async/future MPI_Ibsend, … I’d rather put the burden on the implementation, not the Programmers ! How to do adaptive implementation in that context ? Not talking about: the combinatory that occurs between send and receive the semantic problems that occur in distributed implementations 33 Denis Caromel
Application Semantics rather thanLow-Level Architecture-Based Optimization • MPI: • MPI_Send MPI_Recv MPI_Ssend MPI_Irecv MPI_Bsend MPI_Rsend MPI_Isend MPI_Ibsend • What we propose: • High-level Information from Application Programmer • (Experimented on 3D ElectroMagnetism, and Nasa Benchmarks) • Examples: • ro.foo ( ForgetOnSend (params) ); • ActiveObject.exchange(…params ); Optimizations for Both • Distributed & • Multi-Core Denis Caromel
NAS Parallel Benchmarks • Designed by NASA to evaluate benefits of high performance systems • Strongly based on CFD • 5 benchmarks (kernels) to test different aspects of a system • 2 categories or focus variations: • communication intensive and • computation intensive
Communication IntensiveCG Kernel (Conjugate Gradient) • Floating point operations • Eigen value computation • High number of unstructured communications • 12000 calls/node • 570 MB sent/node • 1 min 32 • 65 % comms/WT Data density distribution Message density distribution Denis Caromel
Communication IntensiveCG Kernel (Conjugate Gradient) Comparable Performances Denis Caromel
Communication IntensiveMG Kernel (Multi Grid) • 600 calls/node • 45 MB sent • 1 min 32 • 80 % comms • Floating point operations • Solving Poisson problem • Structured communications Data density distribution Message density distribution Denis Caromel
Communication IntensiveMG Kernel (Multi Grid) Pb. With high-rate communications 2D 3D matrix access Denis Caromel
Computation IntensiveEP Kernel (Embarrassingly Parallel) • Random numbers • generation • Almost no • communications This is Java!!! Denis Caromel
Related Talk at PDP 2009 • T4 Session, today at 14:00 • NPB-MPJ: NAS Parallel Benchmarks Implementation for • Message Passing in Java • Damián A. Mallón, Guillermo L. Taboada, Juan Touriño, Ramón Doallo • Univ. Coruña, Spain Denis Caromel
Parallel Components 42 Denis Caromel
GridCOMP Partners Denis Caromel
Objects to Distributed Components IoC: Inversion Of Control (set in XML) A Example of component instance V Truly Distributed Components Typed Group Java or Active Object JVM 44 Denis Caromel
GCM Scopes and Objectives: Grid Codes that Compose and Deploy No programming, No Scripting, … No Pain Innovation: Abstract Deployment Composite Components Multicast and GatherCast MultiCast GatherCast
Optimizing MxN Operations 2+ composites can be involved in the Gather-multicast Denis Caromel
Related Talk at PDP 2009 • T1 Session, Yesterday at 11:30 • Towards Hierarchical Management of Autonomic Components: • a Case Study • Marco Aldinucci, Marco Danelutto, and Peter Kilpatrick Denis Caromel
Skeleton Denis Caromel
Algorithmic Skeletons for Parallelism • High Level Programming Model [Cole89] • Hides the complexity of parallel/distributed programming • Exploits nestable parallelism patterns BLAST Skeleton Program Parallelism Patterns farm while d&c(fb,fd,fc) Task pipe if for pipe fork seq(f3) divide & conquer seq(f2) seq(f1) Data map fork
Algorithmic Skeletons for Parallelism publicboolean condition(BlastParams param){ File file = param.dbFile; return file.length() > param.maxDBSize; } • High Level Programming Model [Cole89] • Hides the complexity of parallel/distributed programming • Exploits nestable parallelism patterns BLAST Skeleton Program Parallelism Patterns farm while d&c(fb,fd,fc) Task pipe if for pipe fork seq(f3) divide & conquer seq(f2) seq(f1) Data map fork