400 likes | 524 Views
The French ACI GRID* initiative and its latest achievements using Grid'5000. * Member of the non-usual suspect National Grid Initiatives (W. Gentzcsh). Thierry PRIOL Director of the French ACI GRID Thierry.Priol@inria.fr Franck Cappello Director ACI GRID Grid’5000 Franck.Cappello@inria.fr.
E N D
The French ACI GRID* initiative and its latest achievements using Grid'5000 * Member of the non-usual suspect National Grid Initiatives (W. Gentzcsh) Thierry PRIOL Director of the French ACI GRID Thierry.Priol@inria.fr Franck Cappello Director ACI GRID Grid’5000 Franck.Cappello@inria.fr Contents • An overview of the ACI GRID initiative and some of the projects • The Grid’5000 project • Concluding remarks
Objectives of the ACI GRID • Push the national research effort on grid computing • Increase the visibility of French Grid research activities • Fund medium and long term research activities in Grid using a bottom-up approach (nothing imposed !) • Stimulate synergies between research groups • Encourage experimentations with the available grid infrastructure being deployed through national projects • Develop new software for experimental grid infrastructures • New system and programming environments for distributed computing or large data management The French ACI GRID initiative and its latest achievements using Grid'5000
Organisation • Programme Director : Thierry Priol • since January 2004, M. Cosnard before • Scientific council : Brigitte Plateau • Budget: ~8 M€* (including 8 PhD grants) • This is incentive funding (around 98.3 M€ estimated by GridCoord) 2002 2003 2004 2001 2005 2006 2007 Call1 18 projects 2.25M€ Call2 12 projects 3M€ Call3-G5K 5 projects 1M€ Call4-G5K 6 projects 1M€ * Computing and network infrastructures, permanent researchers salaries already paid by the state The French ACI GRID initiative and its latest achievements using Grid'5000
Several kinds of projects • Multidisciplinary project • Software project • Young research team • Collaboration • International • Testbed The French ACI GRID initiative and its latest achievements using Grid'5000
Middleware, tools, environments CGP2P (F. Cappello, LRI/CNRS) ASP (F. Desprez, ENS Lyon/INRIA) EPSN (O. Coulaud, INRIA) PADOUE (A. Doucet, LIP6) MEDIAGRID (C. Collet, IMAG) DARTS (S. Frénot, INSA-Lyon) Grid-TLSE (M. Dayde, ENSEEIHT) RMI (C. Pérez, IRISA) CONCERTO (Y. Maheo, VALORIA) CARAML (G. Hains, LIFO) Algorithms TAG (S. Genaud, LSIIT) ANCG (N. Emad, PRISM) DOC-G (V-D. Cung, UVSQ) Compiler techniques Métacompil (G-A. Silbert, ENMP) Networks and communication RESAM (C. Pham, ENS Lyon) ALTA (C. Pérez, IRISA/INRIA) Applications COUMEHY (C. Messager, LTHE) - Climate GenoGrid (D. Lavenier, IRISA) - Bioinformatics GeoGrid (J-C. Paul, LORIA) - Oil reservoir IDHA (F. Genova, CDAS) - Astronomy Guirlande-fr (L. Romary, LORIA) - Language GriPPS (C. Blanchet, IBCP) - Bioinformatics HydroGrid (M. Kern, INRIA) - Environment Medigrid (J. Montagnat, INSA-Lyon) - Medical Grid Testbeds CiGri-CIMENT (L. Desbat, UjF) Mecagrid (H. Guillard, INRIA) GLOP (V. Breton, IN2P3) GRID5000 (F. Cappello, INRIA) Support for disseminations ARGE (A. Schaff, LORIA) GRID2 (J-L. Pazat, IRISA/INSA) DataGRAAL (Y. Denneulin, IMAG) ACI GRID projects The French ACI GRID initiative and its latest achievements using Grid'5000
C, Fortran, Java (Software database distributed) AGENT Scheduler Performancedatabase (distributed) grpc_call(MatPROD, A, B); …… AGENT Scheduler Direct connection S3 AGENT Scheduler Batch system S1 LocalScheduler S2 Visualization server GRID ASP: Client/Server Approach for Simulation over the Grid • Call 1 (2001 - 2003) • Project coordinator: F. Desprez • E-mail : Frederic.Desprez@inria.fr • Web: http://graal.ens-lyon.fr/ASP/ • Participants • ENS-Lyon, INRIA, LORIA, LIFC, IRCOM, LST, SRSMC, Physique Lyon1 • Objectives • Building a portable set of tools for computational servers in a ASP (Application Service Provider) model • DIET (Distributed Interactive Engineering Toolbox) • Porting several different applications • physic, geology, chemistry, electronic device simulation, robotics, … • Focus on issues • resource localization (hierarchical) scheduling, performance evaluation (both static and dynamic), data persistence, data redistribution between servers • Clients • C, C++, Scilab, Web browser The French ACI GRID initiative and its latest achievements using Grid'5000
TLSE : Web expert site for sparse matrices based on grid infrastructure • Call 2 (2002 - 2004) • Project coordinator: Michel Daydé • E-mail : Michel.Dayde@enseeiht.fr • Web: http://www.enseeiht.fr/lima/tlse/ • Participants • CERFACS, FéRIA-IRIT, LIP-ENSL, LaBRI, CEA, CNES, EADS, EDF, IFP • Objectives • Design a Web expertise site for sparse matrices • Dissemination of our expertise in sparse linear algebra • Easy access and experimentation with software and tools: only statistics are provided, not computing resources • Exploitation of the computing power of the grid for parametric studies • Contents : Sparse matrix software, Bibliography, Collections of sparse matrices The French ACI GRID initiative and its latest achievements using Grid'5000
CGP2P: Global P2P Computing“Fusion of Desktop Grid and P2P systems” Service provider (PC) • Call 1 (2001 - 2003) • Coordinator: Franck Cappello, • email: fci@lri.fr • Web: www.lri.fr/~fci • Participants: LRI, LIFL, ID IMAG, LARIA, LAL, EADS Client (PC) accept request Potential communications for parallel applications (MPI) provide result Coordination system request accept Client (PC) result provide Service provider (PC) Requests concern computations or data Services concern computation or data • Desktop Grid middleware: XtremWeb • Fault tolerant MPI: MPICH-V • Sandbox for binary applications: SBSLM • Large Scale Storage: US • Workflow/Dataflow language: YML • Scheduling simulator: SimLargeGrid • French ADSL analysis • Theoretical proof of the protocols • Convergence/Integration with GRID (GT3) The French ACI GRID initiative and its latest achievements using Grid'5000
Bandwidth (MB/s) OmniORB MICO Orbacus Orbix/E 250 Mome Mpich Kaffe CERTI CORBA/Myrinet-2000 MPI/Myrinet-2000 200 Java/Myrinet-2000 CORBA/SCI PadicoTM Services HLA DSM MPI CORBA JVM MPI/SCI TCP/Ethernet-100 150 Personality Layer PadicoTMCore 100 Internal engine Madeleine Portability across networks TCP 50 Marcel I/O aware multi-threading Networks Multithreading Myrinet SCI 0 1 10 100 1000000 1E+07 1000 10000 100000 Message size (bytes) RMI: Programming the Grid with distributed Objects • Call 1 (2001 - 2003) • Project coordinator: C. Pérez • E-mail : Christian.Perez@irisa.fr • Web: http://www.irisa.fr/Grid-RMI/en/ • Participants • IRISA, ENS-Lyon, LIFL, INRIA, EADS • Objectives • Provide a framework to combine various communication middleware and runtimes • For parallel programming: • Message based runtimes (MPI, PVM, …) • DSM-based runtimes (TreadMarks, …) • For distributed programming • RPC/RMI based middleware (DCE, CORBA, Java) • Middleware for discrete-event based simulation (HLA) • Get the maximum performance from the network! • Offer zero-copy mechanism to middleware/runtime The French ACI GRID initiative and its latest achievements using Grid'5000
HydroGrid: distributed code coupling in hydrogeology, using software components flow transport chemistry • Call 2 (2002 - 2004) • Project coordinator: M. Kern • E-mail : Michel.Kern@inria.fr • Web:http://www-rocq.inria.fr/~kern/ HydroGrid/HydroGrid-en.html. • Participants: INRIA Rocquencourt, INRIA Rennes, IMFS Strasbourg, Geosciences Rennes • Objectives • Simulate flowand transport of pollutants in the subsurface • Take into account couplings between different physical phenomena • Couple parallel codes on a grid, software from ACIGRID RMI project • Links between numerical and software coupling • Example applications: reactive transport (top), density driven flow (bottom), fractured media meshing visualization Density driven flow : mass fraction The French ACI GRID initiative and its latest achievements using Grid'5000
Main feedback from call1 & call2 projects • Lack of a large scale testbed available for experiments • Several small scale testbeds at the regional level • Duplication of effort when setting up testbeds • Various type of Grids • Need to be able to experiment various software layers • Incompatible with a production Grid The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 How to proceed… log(cost & coordination) Major challenge Data Grid eXplorer WANinLab Emulab DAS PlanetLab Naregi Testbed NSF GENI Challenging SimGrid MicroGrid Bricks NS, etc. Reasonable Model Protocol proof log(realism) emulation math simulation live systems • In the first ½ of 2003, the design and development of • an experimental platform for Grid researchers was decided: • Grid’5000 as a real life system The French ACI GRID initiative and its latest achievements using Grid'5000
Grid Application Grid Middleware OS (…) Grid BIOS Grid’5000 Objective • Deploy an experimental large scale computing infrastructure to allow any kind of experiments • Experiments of any kind of grids (Virtual Supercomputer, Desktop Grid, …) • Experimental conditions • Configuration of the entire software stack • from the application to the operating system Grid testbed Computer testbed Application Middleware OS (…) BIOS Grid’5000 The French ACI GRID initiative and its latest achievements using Grid'5000
The Grid’5000 Project • Building a nation wide experimental platform for Large scale Grid & P2P experiments • 9 geographically distributed sites • Every site hosts a cluster (from 256 CPUs to 1K CPUs) • All sites are connected by RENATER (French Res. and Edu. Net.) • RENATER hosts probes to trace network load conditions • Design and develop a system/middleware environment for safely test and repeat experiments • Use the platform for Grid experiments in real life conditions • Port and test applications, develop new algorithms • Address critical issues of Grid system/middleware: • Programming, Scalability, Fault Tolerance, Scheduling • Address critical issues of Grid Networking • High performance transport protocols, Qos • Investigate original mechanisms • P2P resources discovery, Desktop Grids The French ACI GRID initiative and its latest achievements using Grid'5000
Today June 2003 2004 2005 2006 2007 5000 Discussions Prototypes Installations Clusters & Net 3500 Preparation Calibration ~2500 Experiments 2000 International collaborations CoreGRID 2300 1250 CPUs Processors First Experiments Funded Planning The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 foundations:Measurements and condition injection • Quantitative metrics : • Performance: Execution time, throughput, overhead, QoS (Batch, interactive, soft real time, real time). • Scalability:Resource occupation (CPU, memory, disc, network), Applications algorithms, Number of users, Number of resources. • Fault-tolerance:Tolerance to very frequent failures (volatility), tolerance to massive failures (a large fraction of the system disconnects), Fault tolerance consistency across the software stack. • Experimental Condition injection : • Background workloads: CPU, Memory, Disk, network, Traffic injection at the network edges. • Stress: high number of clients, servers, tasks, data transfers, • Perturbation: artificial faults (crash, intermittent failure, memory corruptions, Byzantine), rapid platform reduction/increase, slowdowns, etc. Allow users running their favorite measurement tools and experimental condition injectors The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 principle: A highlyreconfigurable experimental platform Application Programming Environments Application Runtime Measurement tools Experimental conditions injector Grid or P2P Middleware Operating System Networking Let users create, deploy and run their software stack, including the software to test and their environment + measurement tools + experimental conditions injectors The French ACI GRID initiative and its latest achievements using Grid'5000
Log into Grid’5000 Import data/codes yes Build an env. ? no Reserve nodes corresponding to the experiment Reserve 1 node Reboot node (existing env.*) Reboot the nodes in the user experimental environment (optional) Adapt env. Transfer params + Run the experiment Reboot node Collect experiment results Env. OK ? Exit Grid’5000 *Available on all sites: Fedora4all Ubuntu4all Debian4all yes Experiment workflow The French ACI GRID initiative and its latest achievements using Grid'5000
Lille: 500 (106) Nancy: 500 (94) Rennes 518 (518) Orsay 1000 (684) Lyon 500 (252) Bordeaux 500 (96) Grenoble 500 (270) Toulouse 500 (116) Sophia Antipolis 500 (434) Should be red today at Orsay ! Grid’5000 map The French ACI GRID initiative and its latest achievements using Grid'5000
Rennes Lyon Sophia Grenoble Bordeaux Toulouse Orsay The French ACI GRID initiative and its latest achievements using Grid'5000
Hardware Configuration The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 network provided by RENATER 10 Gbps Dark fiber Dedicated Lambda Fully isolated traffic! The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 as an Instrument • A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature • Grid’5000 is confined: communications between sites are isolated from the Internet and Vice versa (level2 MPLS, Dedicated lambda). • A software infrastructure allowing users to access Grid’5000 from any Grid’5000 site and have simple view of the system • A user has a single account on Grid’5000, Grid’5000 is seen as a cluster of clusters, 9 (1 per site) unsynchronized home directories • A reservation/scheduling tools allowing users to select nodes and schedule experiments • a reservation engine + batch scheduler (1 per site) + OAR Grid (a co-reservation scheduling system) • A user toolkit to reconfigure the nodes software image deployment and node reconfiguration tool The French ACI GRID initiative and its latest achievements using Grid'5000
OS Reconfiguration techniquesReboot OR Virtual Machines Virtual Machine: No need for reboot Virtual machine technology Selection not so easy Xen has some limitations: -Xen3 in “initial support” status for intel vtx -Xen2 does not support x86/6 -Many patches not supported -High overhead on high speed Net. Reboot: Remote control with IPMI, RSA, etc. Disc repartitioning, if necessary Reboot or Kernel switch (Kexec) Currently we use Reboot, but Xen will be used in the default environment. Let users select its experimental environment: Fully dedicated or shared within virtual machine The French ACI GRID initiative and its latest achievements using Grid'5000
April: just before SC’06 and Grid’06 deadlines Resource usage: activity (Feb’06) The French ACI GRID initiative and its latest achievements using Grid'5000 Activity > 70%
Community: Grid’5000 users 345 registered Users Coming from 45 Laboratories. Univ.Nantes Sophia CS-VU.nl FEW-VU.nl Univ. Nice ENSEEIHT CICT IRIT CERFACS ENSIACET INP-Toulouse SUPELEC IBCP IMAG INRIA-Alpes INSA-Lyon Prism-Versailles BRGM INRIA CEDRAT IME/USP.br INF/UFRGS.br LORIA UFRJ.br LABRI LIFL ENS-Lyon EC-Lyon IRISA RENATER IN2P3 LIFC LIP6 UHP-Nancy France-telecom LRI IDRIS AIST.jp UCD.ie LIPN-Paris XIII U-Picardie EADS EPFL.ch LAAS ICPS-Strasbourg The French ACI GRID initiative and its latest achievements using Grid'5000
About 230+ Experiments The French ACI GRID initiative and its latest achievements using Grid'5000
About 200 Publications The French ACI GRID initiative and its latest achievements using Grid'5000
A series of Events The French ACI GRID initiative and its latest achievements using Grid'5000
Grid@work (Octobre 10-14 2005) • Series of conferences and tutorials including Grid PlugTest (N-Queens and Flowshop Contests). The objective of this event was to bring together ProActive users, to present and discuss current and future features of the ProActive Grid platform, and to test the deployment and interoperability of ProActive Grid applications on various Grids. Don’t miss Grid@work 2006 in Nov. 26 to Dec. 1 http://www.etsi.org/plugtests/Upcoming/GRID2006/GRID2006.htm The N-Queens Contest (4 teams) where the aim was to find the number of solutions to the N-queens problem, N being as big as possible, in a limited amount of time The Flowshop Contest (3 teams) 1600 CPUs in total: 1200 provided by Grid’5000 + 50 by the other Grids (EGEE, DEISA, NorduGrid) + 350 CPUs on clusters. The French ACI GRID initiative and its latest achievements using Grid'5000
Reference: 32 CPUs Experiment: Geophysics: Seismic Ray Tracing in 3D mesh of the Earth Stéphane Genaud , Marc Grunberg , and Catherine Mongenet IPGS: “Institut de Physique du Globe de Strasbourg” Building a seismic tomography model of the Earth geology using seismic wave propagation characteristics in the Earth. Seismic waves are modeled from events detected by sensors. Ray tracing algorithm: waves are reconstructed from rays traced between the epicenter and one sensor. A MPI parallel program composed of 3 steps 1) Master-worker: ray tracing and mesh update by each process with blocks of rays successively fetched from the master process, 2) all-to all communications to exchange submesh in-formation between the processes, 3) merging of cell information of the submesh associated with each process. The French ACI GRID initiative and its latest achievements using Grid'5000
Jxta DHT scalability Edge Peer rdv Peer • It requires 2 hours to contact all “rendez vous” peers • With the per default setting, the view of every rendez vous peers is limited to only 300 rendez vous peers • The view of every “rendez vous” peer is very unstable • Goals: study of a JXTA “DHT” • “Rendez vous” peers form the JXTA DHT • Performance of this DHT? • Scalability of this DHT? • Organization of a JXTA overlay (peerview protocol) • Each rendezvous peer has a local view of other rendezvous peers • Loosely-Consistent DHT between rendezvous peers • Mechanism for ensuring convergence of local views • Benchmark: time for local views to converge • Up to 580 nodes on 6 sites The French ACI GRID initiative and its latest achievements using Grid'5000
A C D B E F G Fully Distributed Batch Scheduler L. Rilling et al., 2006 • Motivation : evaluation of a fully distributed resource allocation service (batch scheduler) • Vigne : Unstructured network, flooding (random walk optimized for scheduling). • Experiment: a bag of 944 homogeneous tasks / 944 CPU • Synthetic sequential code (monte carlo application). • Measure of the mean execution time for a task (computation time depends on the resource) • Measure the overhead compared with an ideal execution (central coordinator) • Objective: 1 task per CPU. • Tested configuration: • Result : 944 CPUs Bordeaux (82), Orsay(344), Rennes Paraci (98), Rennes Parasol (62), Rennes Paravent (198), Sophia (160) Duration: 12 hours The French ACI GRID initiative and its latest achievements using Grid'5000
Large Scale experiment of DIET:A GridRPC environment 1120 clients submitted more than 45 000 REAL GridRPC requests (dgemm matrix multiply) to GridRPC servers Objectives : - Proove that the DIET environment is scallable. - Test the functionnalities of DIET at large scale 7 sites : Lyon, Orsay,Rennes, Lilles, Sophia,Toulouse,Bordeaux 8 clusters - 585 machines - 1170 CPUs. The French ACI GRID initiative and its latest achievements using Grid'5000 Raphaël Bolze
Solving the Flow-Shop Scheduling Problem E. Talbi, N. Melab, 2006 “one of the hardest challenge problems in combinatorial optimization” • Schedule a set of jobs on a set of machines minimizing the makespan. • Jobs order must be respected and machines can execute 1 job at a time. • Complexity is very high for large size instances (possible schedules). • Exhaustive enumeration of all combinations would take several years. • The challenge is thus to reduce the number of explored solutions. • But the problem cannot be efficiently solved without computational grids. • New Grid exact method based on the Branch-and-Bound algorithm (Talbi, melab, et al.), combining new approaches of combinatorial algorithmic, grid computing, load balancing and fault tolerance. • Problem: 50 jobs on 20 machines, optimally solved for the 1st time, with 1245 CPUs (peak) • Using simultaneously Grid5000 and other clusters • Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse. • The optimal solution required a wall-clock time of 1 month and 3 weeks. The French ACI GRID initiative and its latest achievements using Grid'5000
TCP limits over 10Gb/s links P. Primet et al., 2006 • Highlighting TCP stream interaction issues in very high bandwidth links (congestion colapse) and poor bandwidth fairness • Grid’5000 10Gb/s connections evaluation • Evaluation of TCP variants over Grid’5000 10Gb/s links (BIC TCP, H-TCP, weswood…) Interaction of 10 1Gb/s TCP streams, over the 10Gb/s Rennes-Nancy link, during 1 hour. Aggregated bandwidth of 9,3 Gb/s on a time interval of few minutes. Then a very high drop of the bandwidth on one of the connection. The French ACI GRID initiative and its latest achievements using Grid'5000
Grid’5000 main achievements in 2006 • A large scale and highly reconfigurable Grid experimental platform • Used by Master student Ph. D., PostDoc and researchers (and results are presented in their reports, thesis, papers, etc.) • Grid’5000 offers in 2006: • 9 clusters distributed over 9 sites in France, • about 10 Gigabit/s (directional) of bandwidth • the capability for all users to reconfigure the platform [protocols/OS/Middleware/Runtime/Application] • Grid’5000 results in 2006: • 300+ users • ~200 publications, • ~230 planned experiments • Grid’5000 is opened to French Grid researchers since July 2005 • Grid’5000 is opened to others communities in 2006 (CoreGRID) • Grid’5000 winter school (Philippe d’Anfray, ~January 2007) • Connection to other Grid experimental platforms • Netherlands (from October 2006), Japan (under discussion) • Sustainability ensured by INRIA after 2007 The French ACI GRID initiative and its latest achievements using Grid'5000
Oct. 2006 1500 CPUs DAS3 2600 CPUs Grid’5000 Concluding remarks • GRID in its wider definition • Computing, data and knowledge Grids, P2P • Not only focusing on the use of Supercomputers… neither on Globus… • An emphasis on middleware but also on applications/algorithms to make them Grid-aware • The French ACI GRID lead to many European initiatives • Several groups of the ACI GRID projects are involved in EU funded projects (almost absent in FP5, involved in 10 projects in FP6 and leader of 3 projects) • The idea to set up a Network of Excellence in Grid Research came from the ACI GRID (M. Cosnard) • On-going discussions to have a European dimension of Grid’5000 funded under the 7th Framework Programme • Funding of Grid research yet available • Through the “Agence National de la Recherche” • To get more information about the ACI-GRID • http://www-sop.inria.fr/aci/grid • Thierry.Priol@inria.fr The French ACI GRID initiative and its latest achievements using Grid'5000
AnnouncementProject consultation Meeting Bridging Global Computing with Grid (BIGG) In conjunction with November 28-29, 2006 The objective of the workshop is to provide a direct gateway facilitating interactions between two different communities: Grid & Global Computing The French ACI GRID initiative and its latest achievements using Grid'5000