120 likes | 259 Views
SC2002 Panel: Desktop Grids: 10,000-Fold Parallelism for the masses. F. Cappello , A. Bouteiller, S. Djilali, G. Fedak, C. Germain, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov Cluster et GRID group LRI, Université Paris sud. fci@lri.fr www.lri.fr/~fci/Group.
E N D
SC2002 Panel: Desktop Grids: 10,000-Fold Parallelism for the masses F. Cappello, A. Bouteiller, S. Djilali, G. Fedak, C. Germain, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov Cluster et GRID group LRI, Université Paris sud. fci@lri.fr www.lri.fr/~fci/Group Desktop Grid with XtremWeb: User experiences and feedback Destop Grids Panel at SC2002, Baltimore
hierarchical XW coordinator Peer to Peer Coordinator PC coordinator Global Computing (client) PC Worker XtremWeb: General Architecture • XtremWeb: Free, OpenSource, PC-Grid framework • For research studies and production • 3 entities : client/coordinator/worker (diff protect. domains) PC Client/worker Internet or LAN PC Worker PC Client/Worker Destop Grids Panel at SC2002, Baltimore
XtremWeb: Userprojects C@sper • 1 CGP2PACI GRID(academic research on Desktop Grid systems), France • Industry research project (Airbus + Alcatel Space), France • 3 Augernome XtremWeb(Campus wide Desktop Grid), France • IFP (French Petroleum Institute), France • 5 EADS (Airplane + Ariane rocket manufacturer), France • 6 University of Geneva, (research on Desktop Grid systems), Switzerland • University of Winsconsin Madisson, Condor+XW, USA • University of Gouadeloupe + Paster Institute: Tuberculoses, France • 9 Mathematics lab University of Paris South (PDE solver research) , France • University of Lille(control language for Desktop Grid systems), France • 1 ENS Lyon: research on large scale storage, France Destop Grids Panel at SC2002, Baltimore
Casper (Airbus, Alcatel Space) Building an open source Framework for sharing software and hardware resources Desktop Grid Component Execution Clusters Component Execution XtremWeb Numerical Component ASP Job submission Data Transfer Scheduler Java Data User management Data management Portal (pages web) Parallel Computers Job submission Data Transfer Java Job submission Data Transfer Component Execution Pc or Station Client Html SSL data Numerical Component Destop Grids Panel at SC2002, Baltimore Data
Augernome XtremWeb • Particle Physics Laboratory (LAL-IN2P3) • Understanding the origin of very high energy cosmic rays (10^20 ev) • Air Shower Extended Simulator • Typical execution time: 10 hours • Number of simulations: Millions • Proteins Modeling and Engineering Laboratory • Structural Genomic: Numerical simulation of proteins conformation change • Charm Molecular dynamic • Various execution time • Movie generation (large number of simulations) Destop Grids Panel at SC2002, Baltimore
Pierre AugerObservatory Understanding the origin of very high cosmic rays: • Aires: Air Showers Extended Simulation • Sequential, Monte Carlo.Time for a run: 5 to 10 hours Air shower parameter database (Lyon, France) Traditional Super Computing Centers XtremWeb Server CINES (Fr) Estimated PC number ~ 5000 air shower PC worker Internet and LAN Fermi Lab (USA) PC Client PC Worker PC worker Aires Destop Grids Panel at SC2002, Baltimore
French Petroleum Institute Gibbs Application Molecular modeling • Monte-Carlo simulation • Task duration : ~ 48 hours on a typical PC Senkin Application Computation of the auto-inflaming time for a gas • Application to car Gasoline engines: • multi-parameter execution, 12000 of parameter set for 1 study • 10 minutes of computation, • input file size: 200k, output file size 5 Mo Destop Grids Panel at SC2002, Baltimore
XtremWeb VS. MPI 00:23:02 00:21:36 00:20:10 00:18:43 00:17:17 00:15:50 00:14:24 00:12:58 temps h:m:s 00:11:31 XtremWeb 00:10:05 MPI 00:08:38 00:07:12 00:05:46 00:04:19 00:02:53 00:01:26 00:00:00 4 8 16 Number of processors EADS-CCR (Airbus, Ariane) Cassiope application: Ray-tracing Destop Grids Panel at SC2002, Baltimore
Feedbacks What applications are you running on desktop Grids? What scale? Scientific applications Multiparameters: Particle physics: AIRES (AIR shower Extended Simulation) 5000 nodes (expected) Molecular Dynamics: Charm 1000 nodes (expected) Computational Fluid Dynamic (at IFP): 2000 nodes (expected) etc. Master-Worker: Raytracing (EADS) ??? nodes expected Destop Grids Panel at SC2002, Baltimore
Application Range, & deployment What range of applications and scale of deployment do you expect? What reduces the application range? Computational resource capacities: Limited memory (128 MB, 256 MB), Limited network performance (100baseT), Available programming models: Master-Worker, RPCs Need for MPI What makes the deployment a complex issue? Human factor (system administrator, PC owner) Use of network resources (backup during the night) Dispatcher scalability (hierarchical, distributed?) Complex topology (NAT, firewall, Proxy). Destop Grids Panel at SC2002, Baltimore
Programmer’s view unchanged: PC client MPI_send() PC client MPI_recv() MPICH-V (Volatile) Goal: execute existing or new MPI Apps Problems: 1) volatile nodes(any number at any time) 2) firewalls(PC Grids) 3) non named receptions( should be replayed in the same order as the one of the previous failed exec.) Objective summary: 1) Automatic fault tolerance 2) Transparency for the programmer & user 3) Tolerate n faults (n being the #MPI processes) 4) Firewall bypass (tunnel) for cross domain execution 5) Scalable Infrastructure/protocols 6) Avoid global synchronizations (ckpt/restart) 7) Theoretical verification of protocols Destop Grids Panel at SC2002, Baltimore
Concluding remarks What we have learned: • Deployment is critical and may take a long time for non specialist • Users don’t understand immediately the computational power potential of Desktop Grid • When they understand, their propose new utilization of their applications (similar to the transition from sequential to parallel) • They also rapidly ask for more resources!!! • Users ask for more programming model paradigms MPI • Strong need for tools helping users browsing their mountain of results We need more feedback: • Deployment • Applications • Programming models • MPICH-V www.xtremweb.org Destop Grids Panel at SC2002, Baltimore