300 likes | 390 Views
Grid eXplorer (GdX) A research tool for exploring Grid/P2P issues. Pascale Primet INRIA RESO LIP, ENS de Lyon. Pascale.Primet@ens-lyon.fr. Franck Cappello INRIA Grand-Large LRI, University of Paris sud. fci@lri.fr. Christohpe Cérin LARIA, University of Amiens.
E N D
Grid eXplorer (GdX) A research tool for exploring Grid/P2P issues Pascale Primet INRIA RESO LIP, ENS de Lyon. Pascale.Primet@ens-lyon.fr Franck Cappello INRIA Grand-Large LRI, University of Paris sud. fci@lri.fr Christohpe Cérin LARIA, University of Amiens. cerin@laria.u-picardie.fr Olivier Richard INRIA Apache ID, IMAG. olivier.richard@imag.fr Pierre Sens INRIA Regal LIP6, University of Paris 6. Pierre.Sens@lip6.fr www.lri.fr/~fci/GdX Research Program ACI “Data Mass” INRIA meeting with NII
Outline • Motivating a large scale instrument for Grid • A large scale instrument for exploring Grid issues in reproducible experimental conditions • Concluding remarks INRIA meeting with NII
Open issues in Grid/P2P • Security • Data Storage/consultation/movement • Multi users/ Multi applications scheduling • Coordination (virtual, ephemeral infrastructure) • Programming • Fault Tolerance! • Scalability • Performance • Easy/efficient deployment techniques • Application characterization techniques • Etc. INRIA meeting with NII
Fundamental components of Grids Static • Systems • nodes, OS, • distributed systems mechanisms (resource discovery, storage, scheduling, etc.), • middleware, runtimes, • Fault (crash, transient) • Workload (multiple users/multiple applications) • Heterogeneity (resource diversity, performance) • Malicious users/behaviors • Networks • routers, links, topology, • protocols, • Theoretical features: synchronous, pseudo synchronous or asynchronous • Disconnection • Packet loss • Congestion Dyn. Static Dyn. INRIA meeting with NII
What are the current approachesfor studying Systems and Networks? • Theoretical models: • Scheduling, load balancing, performance, etc. • congestion, routing, packet loss, topology, traffic, etc. • Difficulty to model dynamic behaviors and system complexity • Simulators: • SimGrid, SimGrid2, GridSim, MicroGrid, Bricks • NS, NS2, Cnet, Real, etc. Strong limitations (scalability, != than execution of real codes, validation) • Experimental testbed: • For Grid Most testbed are for production, each testeb is specific • Long tradition in Network (Arpanet, Magic, Geant, Renater, VTHD) Shared testbed not fully decoupled (cost) from production network, experimental conditions difficult to reproduce, representativeness?) • We have no way to test: a) ideas independently, at a significant scale, • b) with realistic parameters and behaviors! INRIA meeting with NII
Case Study 1:XtremWeb-Auger Understanding the origin of very high cosmic rays: • Aires: Air Showers Extended Simulation • Sequential, Monte Carlo.Time for a run: 5 to 10 hours (500MhzPC) Air shower parameter database (Lyon, France) XtremWeb Server Estimated PC number ~ 5000 • Trivial parallelism • Master Worker paradigm air shower PC worker Internet and LAN PC Client PC Worker PC worker Aires INRIA meeting with NII
Case Study 1: XtremWeb-Auger Icluster Grenoble PBS Madison Wisconsin Condor • Application : • AIRES (Auger) • Deployment: • Coordinator at LRI • Madison:700 workers • Pentium III, Linux • (500 MHz+933 MHz) • (Condor pool) • Grenoble Icluster: 146 workers • (733 Mhz), PBS • LRI: 100 workers • Pentium III, Athlon, Linux • (500MHz, 733MHz, 1.5 GHz) • (Condor pool) U-psud network Internet LRI Condor Pool Autres Labos lri.fr XW Coordinator XW Client INRIA meeting with NII
Case Study 1: XtremWeb-Auger 3000 tasks No way to reproduce the same experimental conditions (Configuration BUT ALSO the Dynamic of the system) How to compare fundamental mechanisms then (scheduling), At large Scale (100 000 nodes)? INRIA meeting with NII
Programmer’s view unchanged: PC client MPI_send() PC client MPI_recv() Case study 2: MPICH-V: Fault tolerant MPI for the Grid Problems: 1) volatile nodes(any number at any time) 2) firewalls(PC Grids) 3) non named receptions( should be replayed in the same order as the one of the previous failed exec.) Channel Memory Checkpoint server Dispatcher Node Network Node Node INRIA meeting with NII
~4,8 Gb/s ~4,8 Gb/s ~1 Gb/s Case study 2: MPICH-V: Fault tolerant MPI for the Grid • Icluster-Imag, 216 PIII 733 Mhz, 256MB/node • 5 subsystems with 32 to 48 nodes, 100BaseT switch • 1Gb/s switch mesh between subsystems • Linux, PGI Fortran or GCC compiler • Very close to a typical Building LAN • Simulate node Volatility Very close to the LRI network! INRIA meeting with NII
Total execution time (sec.) ~1 fault/110 sec. MPICH-V (CM but no logs) MPICH-V (CM with logs) 1100 1050 MPICH-V (CM+CS+ckpt) 1000 950 MPICH-P4 900 850 800 Base exec. without ckpt. and fault 750 700 650 610 0 1 2 3 4 5 6 7 8 9 10 Number of faults during execution Case study 2: MPICH-V: Fault tolerant MPI for the Grid Execution time with faults (Fault injection) MPICH-V vs. MPICH-P4 Interesting but, what about MPICH-V on 10 000 nodes? INRIA meeting with NII
HENP Abilene CalREN WAIL PlanetLab CAIRN NLR WANiLab DummyNet EmuLab ModelNet WAIL NS SSFNet QualNet JavaSim Mathis formula log(abstraction) live nk WANiLab emulation simulation math New generation of research tools • Emulab/NIST Net/Modelnet • Network Emulator with actual PC+ routers emulator (dummynet) • Reproducible experimental conditions • Real applications and protocols • PlanetLab • A real platform distributed over the Internet (planet wide) • Real life conditions (not reproducible) • Real applications and protocols • WANiLab • Cluster with real routers • Reproducible experimental conditions • Real applications and protocols INRIA meeting with NII
Experiments range demandfor Grid/P2P: “GRIDinLAB” • Virtual Grids: (1 Grid node on 1 PC) • Emulation of a Grid/P2P systems at 1:1 scale • Execute real applications on Grid nodes (possibly slowdown the CPU for heterogeneity emulation) • Use actual routers or emulators (Dummynet) • Inject traces (congestion, workload, fault) • Emulation of Grids:(10 or 100 Grid nodes on 1 PC) • Emulation of a Grid/P2P systems at 10/100:1 scale • Execute real applications or core of applications • Use network emulators • Inject synthetic traces • Large scale simulation of Grids:(1000 Grid nodes on 1 PC) • Simulation of Large Grid/P2P systems at 1000:1 • Simulate Application • Simulate network • Simulate dynamic conditions INRIA meeting with NII
HENP Abilene CalREN WAIL PlanetLab CAIRN NLR “GRIDinLAB” in the methodology spectrum log(cost) DummyNet EmuLab ModelNet WAIL WANiLab NS SSFNet QualNet JavaSim Mathis formula Optimization Linear model Nonlinear model Stocahstic model Grid eXplorer Real Wan Routers Emulation driven simulation Large scale simulation Credits: WANiLAB log(abstraction) emulation live nt WANiLab simulation math “GRIDinLAB” INRIA meeting with NII
Outline • Motivating a large scale instrument for Grid • A large scale instrument for exploring Grid issues in reproducible experimental conditions • Concluding remarks INRIA meeting with NII
Grid eXplorer • A “GRIDinLAB” instrument for CS researchers (Not a production facility) • For • Grid/P2P researcher community • Network researcher community • Addressing specific issues of each domain • Enabling research studies combining the 2 domains • Ease and develop collaborations between the two • communities. INRIA meeting with NII
Grid eXplorer A tool set for conducting experiments & measurements, result analysis An experimental conditions database or generation An experimental Platform: A cluster +HP Network +Software INRIA meeting with NII
Grid eXplorer: the big picture A set of sensors A set of tools for analysis Emulator Core Hardware + Soft for Emulation Simulation An experimental Conditions data base Validation on Real life testbed INRIA meeting with NII
Grid eXplorer (GdX) research project 1) Build the instrument: “Design and develop, for the community of Computer Science Researchers, an emulation platform for Large Scale Distributed Systems (Grid, P2P and other distributed systems)” 2) Use the Instrument for “a set of research experiments investigating the impact of Large Scale in distributed systems and especially related to large data sets (security, reliability, performance).” • 1K CPU cluster (may be only 500 depending on the budget) • configurable network (Ethernet, Myrinet, others?) • configurable OS (kernel, distribution, etc.) • A set of emulation/simulation tools (existing + new ones) • Multi-users • Located/managed by IDRIS INRIA meeting with NII INRIA meeting with NII
Laboratories involved in GdX 13 Labs INRIA meeting with NII
Alain Lecluse (IBCP), Alexandre Genoud, (Projet OASIS, INRIA Sophia Antipolis) Antoine Vernois, (IBCP) Arnaud Contes, (Projet OASIS, INRIA Sophia Antipolis) Aurélien Bouteiller, (LRI), Bénedicte Legrand (LIP6) Brice Goglin (doctorant), (INRIA LIP RESO), Brigitte Rozoy (LRI) Cécile Germain (LRI) Christophe Blanchet, (IBCP) Christophe Cérin, (Amiens, Laria) Christophe Chassot, (LAAS-ENSICA), Colette Johnen (LRI) CongDuc Pham, (LIP) Cyril Randriamaro, (LaRIA) Denis Caromel, (Projet OASIS, INRIA Sophia Antipolis) Eddy Caron, (LIP/ENS Lyon), Emmanuel Jeannot, (Loria) Eric Totel (Supélec Rennes) Fabrice Huet, (Projet OASIS, INRIA Sophia Antipolis) Faycal Bouhaf (DEA)(INRIA LIP RESO), Franck Cappello, (LRI) Françoise Baude, (Projet OASIS, INRIA Sophia Antipolis) Frédéric Desprez, (LIP/INRIA Rhône-Alpes), Frédéric Magniette, (LRI) Gabriel Antoniu, (IRISA/INRIA Rennes), George Bosilca, (LRI) Georges Da Costa, (ID-IMAG), Géraud Krawezik (LRI) Gil Utard, (LaRIA) Gilles Fedak, (LRI) Grégory Mounié (ID-IMAG) Guillaume Auriol, (LAAS-ENSICA), Guillaume Mercier, (LaBRI), Guy Bergère, (LIFL, GrandLarge INRIA Futur) Haiwu He, (LIFL, GrandLarge INRIA Futur) Isaac Scherson, (LIFL, GrandLarge, INRIA Futur) Jens Gustedt (LORIA & INRIA Lorraine) Joffroy Beauquier (LRI) Johanne Cohen, (Loria) Kavé Salamatian (LIP6), Lamine Aouad (LIFL, GrandLarge, INRIA Futur) Laurent Baduel, (Projet OASIS, INRIA Sophia Antipolis) Laurent Dairaine, (LAAS) Luc Bougé, (IRISA/ENS Cachan Antenne de Bretagne), Luciana Arantes (LIP6), Ludovic Mé, (Supélec Rennes) Luis Angelo Estefanel, (ID-IMAG) Marin Bertier (LIP6), Mathieu Goutelle, (KIP) Mathieu Jan, (IRISA) Michel Diaz, (LAAS-ENSICA), Michel Koskas (Amiens, Laria) Nicolas Lacorne, (IBCP) Nicolas Larrieu (LAAS-ENSICA), Nicolas Viovy (CEA-DSM-LSCE) Oleg Lodygensky, (LRI) Olivier Richard (ID-IMAG), Olivier Soyez, (LaRIA) Pascal Berthou, (LAAS-ENSICA), Pascale Primet (LIP), Pascale Vicat-Blanc Primet, (INRIA LIP RESO), Patrick Sénac, (LAAS-ENSICA), Philippe d'Anfray (CEA-DTI/SISC), Philippe Gauron, (LRI) Philippe Owezarski (LAAS) Pierre Fraigniaud, (LRI) Pierre Lemarini, (LRI) Pierre Sens (LIP6 / INRIA), Pierre-André Wacrenier, (LaBRI), Raymond Namyst, (LaBRI), Samir Djilali, (LRI) Sébastien Tixeuil (LRI) Serge Petiton, (LIFL, GrandLarge INRIA Futur) Stéphane Vialle (Supélec) Tanguy Pérennou (LAAS) Thierry Gayraud, (LAAS-ENSICA), Thierry Priol, (IRISA) Thomas Hérault, (LRI) Timur Friedman (LIP6) Vincent Danjean, (LaBRI), Vincent Néri (LRI) 84 members INRIA meeting with NII
4 Research Topics The 4 research topics and their leaders: -Infrastructure (Hardware + system), Olivier Richard (ID-IMAG, Grenoble) -Emulation, Pierre Sens (LIP6, Paris 6) -Network, Pascale Primet (LIP,Lyon) -Applications. Christophe Cérin (Laria, Amiens) INRIA meeting with NII
Grid eXplorer (GdX) current status: • First stage: Building the Instrument • First GdX meeting was on September 16, 2003. • Hardware design meeting planned for October 15. • Hardware selection meeting on November 8 • Choosing the nodes (single or dual?) • Chossing the CPU (Intel IA 32, IA64, Athlon 64, etc.) • Chossing the experimental Network (Myrinet, Ethernet, Infiniband, etc.) • Choosing the general experiment production architecture (parallel OS architecture, user access, batch scheduler, result repositoty) • Chossing the experimental database harware • Etc. INRIA meeting with NII
X=X - X-Y 2 Y=Y + Y-Y 2 Example: Nearest Neighbor Scheduling (distribute a task set among a large number of nodes) • Three phases Self stabilizing algo (sand heap): • Negotiation • Distribution • Execution Y Anti clock wise X Negotiation rule: Execution rule: Execution starts Immediately when A local load balance Is reached Distribution rule: Tasks and Parameters follow the negotiation route, immediately If X>Y INRIA meeting with NII
Simulation/Emulation tools INRIA meeting with NII
Nearest Neighbor Schedulingwith a 3D visualization tool • 10K tasks on 900 nodes in mesh • Negotiation (red movie) • Distribution (blue movie) • Execution (green movie) • Observation results: • Symmetry for the negotiation phase • Asymmetry for Distribution and Execution phases. • Waves Several hours to get 1 movie parallel simulation is required! INRIA meeting with NII
Outline • Motivating a large scale instrument for Grid • A large scale instrument for exploring Grid issues in reproducible experimental conditions • Concluding remarks INRIA meeting with NII
www.lri.fr/~fci/GdX INRIA meeting with NII
Grid eXplorer (GdX) • A long term effort A medium term milestone: 2 years for a fully functional prototype • For Grid and Network researcher communities, • Many scientific issues (large scale emulation, experimental conditions injection, distance to reality, etc.) • A Cluster of 1K CPU + Experimental condition database + Simulators/Emulators + Visualization tools INRIA meeting with NII