210 likes | 421 Views
Grid’5000. A Nation Wide Experimental Grid. Grid: Distributed System Problematic renewal. Grid raises a lot of research issues:
E N D
Grid’5000 A Nation Wide Experimental Grid
Grid: Distributed System Problematic renewal • Grid raises a lot of research issues: • Security, Performance, Fault tolerance, Load Balancing, Fairness, Coordination, Message passing, Data storage, Programming, Communication protocols and architecture,Deployment, etc. Theoretical models and simulators cannot capture real life conditions Production platforms have strong difficulties to reproduce experimental conditions • How to test and compare? • Fault tolerance protocols • Security mechanisms • Deployment tools • etc.
Tools for Distributed System Studies To investigate Distributed System issues, we need: 1) Tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between these research tools Tools for Large Scale Distributed Systems log(cost) Real systems Real applications “In-lab” platforms Synthetic conditions Real systems Real applications Real platforms Real conditions Models: Sys, apps, Platforms, conditions Key system mecas. Algo, app. kernels Virtual platforms Synthetic conditions log(realism) emulation math simulation live systems
We need a Grid experimental platform According to the current knowledge: There is no large scale testbed dedicated to Grid experiments • Grid’5000 as a live system • Grid eXplorer as a large scale emulator log(cost) Grid’5000 TERAGrid PlanetLab Naregi Testbed Grid eXplorer WANinLab Emulab SimGrid MicroGrid Bricks NS, etc. Model Protocol proof log(realism) emulation math simulation live systems
What do we need for Grid experiments ? • Remotely controllable Grid nodes installed in geographically distributed laboratories • A « Controllable » and « Monitorable » Network between the • Grid nodes • A middleware infrastructure connecting the nodes (security) • A playground to prepare experiments • A toolkit to deploy, manage, run experiments and collect results
The Grid’5000 Project • Building a nation wide experimental platform for • Grid researches (like a particle accelerator for the computer • scientists) • 10/11 geographically distributed sites • every site hosts a cluster (from 256 CPUs to 1K CPUs) • All sites are connected by RENATER (French Academ. Network) • RENATER hosts probes to trace network condition load • Design and develop a system/middleware environment • for safely test and repeat experiments • 2) Use the platform for Grid experiments • Address critical issues of Grid system/middleware: • Programming, Scalability, Fault Tolerance, Scheduling • Adress critical issues of Grid Networking • High performance transport protocols,Qos • Port and test applications • Investigate original mecanisms • P2P resources discovery, Desktop Grids
Grid’5000 Committees Technical Committee: Steering Committee: (organizer: Franck Cappello, Orsay) -David Gueldrech (Sophia) -Jean Claude Barbet (Orsay) -Franck Bonnassieux (UREC) -Julien le duc (Grenoble) -Fred Desprez (Lyon) -Yvon Jégou (Rennes) -Olivier Coulaud (Bordeaux) -Frédéric Barbaresco (Toulouse) -Thierry Priol (ACI Grid Director) -Brigitte Plateau (President of ACI Grid SC) -Dani Vandrome (Director of Renater) -Frédéric Desprez (Lyon) -Michel Daydé (Toulouse) -Yvon Jégou (Rennes) -Stéphane Lantéri (Sophia) -Raymond Namyst (Bordeaux) -Pascale Primet (Lyon) -Olivier Richard (Grenoble) Forums: Deployment/exploitation: Franck Cappello (AS1, RTP8) Programming models: Raymond Namyst (AS2, RTP8)
Grid’5000 Schedule Call for Expression Of Interest Vendor selection Instal. First tests Final review Fisrt Demo (SC04) Call for proposals Selection of 7 sites ACI GRID Funding Grid’5000 Hardware Grid’5000 System/middleware Forum Security Prototypes Control Prototypes Grid’5000 Programming Forum Grid’5000 Builder Community Grid’5000 Experiments March04 Jun/July 04 Spt 04 Oct 04 Nov 04 Sept03 Nov03 Jan04
Grid’5000 Funding(ACI + Local District/Prefecture) 0,6M€ ~0,4€ ~0,5€ ~0,35€ ~0,5€ ~0,3?€ ~0,35€ Grid’5000 2004 ~3M€ for hardware only
Grid’5000 in September’2004 Grid 5000 nodes (soon 4) 3
Grid’5000 prototype (Control) Control site Site 2 Users (ssh loggin + password) Front end Control Master Control Slave SSH commands -rsync (kernel,dist) -orders (boot, reset) Site 1 LAB/Firewall Router Control Slave Test Cluster Firewall/nat Boot server + dhcp Lab’s Network Site 3 System kernels and distributions are downloaded from a boot server. They are uploaded by the users as system images. Test Cluster
Grid’5000 prototype (test) Control site Site 2 Users (ssh loggin + password) Front end Control Master Control Slave Site 1 LAB/Firewall Router Control Slave Test Cluster Firewall/nat Lab’s Network Site 3 Gateway +VPN (192. For all nodes) Test Cluster One machine Can be seen as a Virtual Grid Gateway
Grid5000@Orsay • Grid’5000 control • -Experiment automation • - VGrid « mapping a virtual Grid on a real testbed » • Fault tolerance • - Fault tolerant Grid-RPC (RPC-V) • - Hierarchical Fault tolerant MPI (MPICH-V) • MPI Environment • - Time sharing Grid resources • - Migration over Clusters with heterogeneous high speed networks • Global Computing/P2P Middleware (XtremWeb) • - Executing Web Services on Desktop Grid Workers • - Distributing the Coordination in Desktop Grids • - Harnessing Clusters as parallel Workers
Grid5000@Lyon (GRAAL Team) • Large scale experimentation of the DIET platform (Distributed Interactive Engineering Toolbox) • Client/Agent/Server model following the GridRPC standard with distributed scheduling agents • Hierarchical and Distributed Scheduling • - Validation of distributed scheduling (hierarchical, distributed, duplicated) and associated heuristics • Automatic Deployment • - Agents mapping depending of the characteristics of the target architecture (network hierarchy, dynamicity, performance variations) • Mixed Parallelism (task and data parallelism) • - Scheduling of requests one after another (online scheduling) or using sets of requests (static knowledge of the dependences and application graph) • Mixing data management and task scheduling • - Replica management, data mapping and redistribution between clusters, task graphs
Grid5000@Lyon (RESO Team) • End Host Communication layer - Intelligent Usage of NICs for local and wide area communications - Direct file access over Myrinet : ORFA/NFS and ORFA/LUSTRE • High performance long distance protocols • - Alternative Transport for very high speed networks (backpressure) • - Differentiated transport with delay control on WAN • - Reliable active and non active Multicast • High Speed Network Emulation - Automatic Deployment of emulated high speed domains - Experiment design for grid flow interactions studies • Grid Networking Layer • - Network Resource and QoS on demand • - Grid Overlay and Programmable Routers • - Measurement Services for network aware middleware
Large-scale scientific computing applications (CFD, astrophysics,…) Multi-parametric applications - ACI GRID-TLSE Project: expertise site for sparse linear algebra - Climate modeling and Global Change - DataGène Project : Functional genomic Management - AROMA tool : management of resources over a Grid of clusters with different classes of services - Mobile agents for open Grid management - Management of Grids and hosted services (security, QoS, monitoring & control, dynamic configuration, …) - Optimization for wide area distributed query processing - Virtualization of data storage on Grids Grid5000@Toulouse/Midi-Pyrénées
JAVA programming on the grid (Oasis project-team) ProActive: a JAVA library for parallel, distributed, concurrent computing with security and mobility Assessment of scalability, deployment, security and fault tolerancy issues Hierarchical components architecture Large scale experimentation of distributed applications JECS: a JAVA Environment for Computational Steering Distributed computing and interactive visualization of 3D numerical simulations (Caiman and Oasis project-teams) Collaborative environment Computational Electromagnetism application (JEM3D) MECAGRID (ACI GRID project, Smash project-team) Massively parallel computations in multi-material fluid mechanics Study of numerical algorithms for heterogeneous computing platforms Grid computing for medical applications (Epidaure project-team) Interoperable medical image registration grid service Optimal design of complex systems (Coprin project-team) Evaluation of parallel optimization algorithms based on interval analysis techniques Study of load balancing strategies on heterogeneous resources Grid5000@Sophia
Grid5000@grenoble • Grid’5000 control • Environment computing deployment (Ka-tools) • Grid Computing - Multi-clusters and lightweights Grid ressource managment (OAR/CIGRI) - Grid file system (NFSG) • Scheduling : Data transferts, global communications, hierarchical • workstealing,... - Monitoring, benchmarking, performance characterisation and analysis • Code Coupling - Application coupling with Athapascan - Communication / method invocation rescheduling into ORB (HOMA) • Fault tolerance / Security - Fault tolerant in data-flow approach (Athapascan) - Probabilistic certification in peer-to-peer systems • Miscellaneous - Collaborating tools in virtual 3D environement.
Grid5000@Rennes • Middleware • PadicoTM/Paco++ combining parallel and distributed computing (ACI GRID • RMI). • Scalability of consistency protocol in DSM (ACI MD GDS), • Experimenting management services for texual document in P2P systems, • Data re-distribution in Grid (ARC INRIA ReDist), • Coupling Computational Grid with Reality Center (INRIA SIAMES, RNTL • SALOME2) • Task distribution and load balancing in heterogeneous Grid (ACI GenoGrid) • Code coupling • Fluid transfer simulation and geological code with PadicoTM/Paco++ (ACI • GRID HydroGrid). • Networking • Network Bandwidth optimization in Grid (VTHD++, Paco++).
Grid5000@Bordeaux • Runtime systems for clusters of clusters and grids (RUNTIME INRIA Team) • High performance communication across heterogeneous networks • Communication libraries: Madeleine, MPICH/Madeleine • Efficient use of high speed networks (Myrinet, Infiniband, SCI) • Fast forwarding and Multiplexing of data on gateway nodes • Distributed Systems and Objects (SOD Team) • Tools to support the development, administration and usage of heterogeneous resources over the Grid • Large-scale scientific computing applications (SCALAPPLIX INRIA Project) • Fluid mechanics, molecular dynamics and host-parasite systems in population dynamics, etc. • Steering of numerical simulations (ACI GRID-EPSN Project) • Parallel on-line visualization / monitoring • Data Redistribution • Computational Steering by direct image manipulation