170 likes | 283 Views
Abstractions: Programming and deploying apps. on Grids. Franck Cappello INRIA* (*this is my own opinion!) CCGRID’08 - Panel. Application. Programming Environments. Application Runtime. Measurement tools. Experimental conditions injector. Grid or P2P Middleware. Operating System.
E N D
Abstractions: Programming and deploying apps. on Grids Franck Cappello INRIA* (*this is my own opinion!) CCGRID’08 - Panel
Application Programming Environments Application Runtime Measurement tools Experimental conditions injector Grid or P2P Middleware Operating System Networking A fully reconfigurable and controllable environment (resource “dedication”) * Grid’5000 >400 experiments in total >100 experiments on apps.
What are the main Distributed Apps. In your project? Application domains: • Life science (mammogram comparison, protein sequencing, Gene prediction, virtual screening, conformation sampling, etc.) • Physics (seismic imaging, parallel solvers, hydrogeology, Self-propelled solids, seismic tomography, etc.) • Applied Mathematics (sparse matrix computation, combinatorial Optimization, parallel model checkers, PDE problem solvers, etc.) • Chemistry (molecular simulation, estimation of thickness on Thin films), • Industrial processes, • Financial computing Main usage of Grid’5000 for these applications: • Evaluate the performance of applications ported to the Grid, • Test alternatives, • Design new algorithms and new methods
What programming difficulties and abstraction opportunities ? • Organizing the calculus • Tolerating performance variations and Hw&Sw failures • Scheduling computation & communications • Implementing computing codes • Synchronizing task executions • Implementing global operations • Selecting the communication protocols • Dealing with resources (data, computers, etc.) • Dealing with administration domains
Current infrastructures: how they mask complexity • Solution 1) ask the “user” to conform to a certain abstraction of the execution platform --> developing applications following standard interfaces (HPC centers, most deployed Grids) • Solution 2) ask the execution platform to conform to “users” abstractions --> users keep their apps. and environment unchanged and need a reconfiguration of the platform (Grid’5000, Amazon Elastic Compute Cloud) • Solution 3) ask the user to choose from a variety of predefined execution environments
What are the common patterns – programming? Programming models tested on Grid’5000: • Rule reduction (e.g. Chemical Computing) --> soon • Graph of components (Data&Workflow) --> OpenWP • Specific control graph controlled by data (e.g. Divide & Conquer, B&B) --> Proactive, PARADISEO • Components (code coupling) --> Grid Corba component model • Components with Control Graphs (Workflow) -->DagMan&Condor • Global operations (MAP-Reduce) --> not aware of • SPMD (MPI for Grids) --> QcGOpenMPI, MPICH-G2, etc. • Client-server (Grid-RPC) --> DIET, XtremWeb, etc. • Assembly languages (set of scripts) …
Example 1: Combinatorial Optimization Problems • Flow-shop (one of the hardest challenge problems in combinatorial optimization): • Schedule a set of jobs on a set of machines minimizing makespan. • Exhaustive enumeration of all combinations would take several years. • The challenge is thus to reduce the number of explored solutions. • Many success stories in combinatorial optimizations: one of the most promising one, in 2008: • Grid’5000 was used to design and improve the algorithm (MOGO) used in the first computer victory against a professional Go player (5 Dan) on a 9x9 plate in the last Paris tournament! (it’s close to the Dan!) • New Grid exact method based on the Branch-and-Bound, combining new approaches of combinatorial algorithmic, grid computing, load balancing and fault tolerance. • Problem: 50 jobs on 20 machines, optimally solved for the 1st time, with 1245 CPUs (peak) • Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse. • The optimal solution required a wall-clock time of 25 days.
AMIBES (EADS): Mesher Module of the jCAE (CAD environment in Java) Example2: OpenWP • OpenWP: • A directive based language and runtime for coarse grain distributed executions • Express dependencies of computing blocs+work distribution • For existing codes • Uses a virtual shared memory model • Run over existing workflow engines Linear Speedup Non parallel region Workflow engine overhead Negligible cost Effect of optimizations
What are the common patterns – Deployment? Applications “deployment” on Grid’5000: • Site level: • Node selection --> OAR • Node Reservation (ISOLATION) --> OAR (batch scheduler) • Reconfiguration --> Kadeploy • Grid Level --> GRUDU (Grid Reservation Utility) • Application configuration and launch --> Adage
Deployment: Grudu (G5K Reservation Utility) All-in-one GUI client-side tool for the monitoring of the Grid'5000 platform. Main goals : • Displaying the status of the platform • Resources allocation through the use of OAR • Resources monitoring through Ganglia • Deployment management with a GUI for KaDeploy • A terminal emulator and a file transfer manager
Application deployment JXTA edge peers Rendez-vous peers MPI Application CCM Application LEGO Application ADAGE: Automatic deployment of large scale applications that need one or multiple middleware systems: MPI, CCM, JXTA, Jobs, GFarm, P2P overlays, DIET Generic Application Description Resource Description Control Parameters Deployment Planning Application Configuration Deployment Execution Jxta Scalability test: -Evaluation of the peerview and discovery protocols -Deployment of 1000s of Jxta peers -Run the scalability test “rendez vous” peers known by one of the “rendez vous” peer X axis: time ; Y axis: “rendez vous” peer ID
Bronze Standard method addressing the issue of medical image algo. evaluation. • Application on estimation of the spatial rigid transformation between two images (convenient to align two different images of a same patient acquired separately). • Complex workflow of computations on large number of data sets. • Typically require 10s to 100s of 3D images pairs. 15 minutes per image pair. • The method is executed with MOTEUR (workflow engine) • Several degrees of parallelism are tested: Resource Dedication: G5K VS. EGEE Execution time (seconds) Execution time (seconds) 14400 12600 Naive execution Data Parallelism only the workflow intrinsic parallelism data sets are processed concurrently 10800 9000 7200 5400 Data Parallelism Data Parallelism + Pipelining 3600 data sets are processed concurrently services in sequential branches are pipelined 1800 Data Parallelism + Pipelining number of images number of images
Gap Analysis Are the patterns (applications) well supported? --> Thanks to the Node reconfiguration model, many patterns are well supportedWhat further abstractions should be considered?--> Node configuration and deployment are still difficult and require too much effort for the users --> the Network resources should be reserved and isolated What abstractions have worked for you?-->Reservation, Isolation, Reconfiguration and Deployment What abstractions do you feel you need? --> Reservation, Configuration and Deployment issuesHow well will abstractions work with the next generation of infrastructure that your project will use? --> Reservation, Isolation, Reconfiguration and deployment will be required for “transparent” Cloud Computing
Programming interface Programming interface (less abstraction but more optimization Opportunities) Compile-time operations & optimizations Runtime operations & optimizations Compile-time operations & optimizations Runtime operations & optimizations Grid Infrastructure Grid Infrastructure The notion of energy “conservation”
Organizing the calculus • Tolerating variations • Scheduling computation & communications • Implementing computing codes • Synchronizing task executions • Implementing global operations • Selecting the communication protocols • Dealing with resources (data, computers, etc.) • Dealing with administration domains “programming” models & Abstractions • Chemical Computing • Data&Workflow • Divide & Conquer • Workflow • MAP-Reduce • MPI for Grids • Grid-RPC • Set of scripts
I didn’t know that Grid had to be programmed (??) • Is there anything so different on Grids that it justifies to program them in a specific way? • What was the promise? An infrastructure providing resources (data, storage, computing) as the power Grid provides electricity --> Transparently • So, why should we care about “programming Grids”? • Because the “abstracting job” is not finished: • Moving data and programs rapidly (protocols) • Dealing with several (many) administration domains (VO) • Dealing with several (many) batch schedulers (interfaces) • Moving data and jobs in a smart way (control) • Tolerating the performance variations & failures of resources • Provide QoS • Etc. • Even the “good” software layer(s) where to implement the abstraction is (are) not stabilized (Middleware, OS, Network ?) • So YES I still have to program Grids
Cloud Computing That’s not a problem • Why should I care about Grid at all ? • There is a new very promising solution… • It is cleaner (environment friendly, more abstract, etc.) • It does not compare with Electricity distribution (the power Grid) • BUT with Water distribution… • It’s