ACI GRID ASP Client-Server Approach for Simulation over the GRID

ACI GRID ASPClient-Server Approach for Simulation over the GRID Frédéric Desprez LIP ENS Lyon ReMaP Project

Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform RNTL

INTRODUCTION • Future of parallel computing: distributed and heterogeneous • Metacomputing/Grid Computing=Using distributed sets of hetereogeneous platforms • Network Computing today! • SMP clusters with very fast processors, high performance (and low cost) networks, (almost) mature software • (Too) many projects • Target: many applications in many different fields (not only number crunching or embarrasingly parallel ones) • Some important problems: • algorithmic (data distribution, load-balancing, latency tolerant algorithms, ...) • system (administration, fault-tolerance, security, resource localisation, …) • software (interoperability, code re-use, ...) • Global Grid forum

INTRODUCTION, cont. • One long-term idea for Grid computing: renting computational power and memory capacity over the net • Very high potential • Need of PSEs (Problem Solving Environments ) and ASPs (Application Service Provider) • Applications will always need more and more computational power and memory capacity • Some libraries or codes need to stay where they have been developed • Some confidential data must not travel over the net • Use of computational servers reachable through a simple interface • Still difficult to use for non-specialists • Almost no transparency • Security and fault-tolerance problems are generally not addressed enough • Often application-dependent PSEs • No standards (CORBA, JAVA/JINI, sockets, …) to build the computational servers

Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform

RPC and Grid-Computing : GridRPC • One simple idea • Implement the (old!) RPC programming model over the GRID • Use computational resources available over the net • Applications that have huge computational and/or data storage needs • Task parallel programming model (synchronous and asynchronous calls) + data-parallelism on the servers themselves, mixed parallelism • Features needed • Load-balancing (resource localisation and performance evaluation, scheduling), • Simple interface, • Data distribution et migration, • Security, • Fault-tolerance, • Interoperability with other systems, …

RPC and Grid-Computing : GridRPC, cont. • Five fundamental components: • ClientProvides several user interfaces and submit requests to servers • ServerReceives clients requests and executes the software modules on their behalf • DatabaseStores the static and dynamic data about the software and hardware resources • SchedulerCatches the clients requests and takes decisions to map the tasks on the servers depending on the data stored in the database • MonitorDynamically monitors the status of computational resources and stores the obtained information in the database

Request S2 ! A, B, C Answer (C) ASP Scheme Client AGENT(s) Op(C, A, B) S1 S3 S4 S2

RPC and Grid-Computing : GridRPC, cont • Middleware between portals and Grid components • Basic tools for the deployment of large scale environments (Web portals, Problem Solving Environments, Grid Toolkits, …) • Big success on several applications • Discussion in the Advanced Programming Models (APM) working group from the Global Grid Forum • GridRPC Client API proposed SCIRun torso defibrillator application – Chris Johnson, U. of Utah

RPC and Grid-Computing: GridRPC: related problems • Security • Authentication and Authorization • Data transfers • Fault-tolerance • Servers or agents • Interoperability • Problem description • API • Data management • Data persistence • Data (re)distribution • Garbage collection • Check-pointing • Fast parallel IO • Scalability • Hierarchy of servers/agents • User assistance/PSE • Automatic choice of solutions • Resource localization • Hardware and software • Scheduling • On-line scheduling of off-line scheduling • Sharing servers between users • Security problems • Lock/unlock, data consistency, race conditions • Performance evaluation • Heterogeneity • Batch systems • Data visualization • Scalability problems • Dynamic platform • Resource localization • Agents/servers mapping

RPC and Grid-Computing : GridRPC, cont • Some available tools • NetSolve(University of Tennessee, USA) • Ninf and OmniRPC (Japan) • DIET(ReMaP, ARES, LIFC, Résédas) • Based on CORBA • NEOS, Meta-NEOS(Argonne National Lab., USA) • Combinatorial optimization problems • RCS(ETH Zürich) • ScaLAPACK Servers • NIMROD, NIMROD-G(Monash University, Australia)

Project Overview • Multi-disciplinary project • Rent computational power and memory capacity over the net • Four applications with different needs and different behavior • Develop a toolbox for the deployment of application servers • Study the impact of these applications on our environment and adapt it to these new needs • A highly hierarchical and heterogeneous network (VTHD + networks of the labs involved in the project) • A software architecture developed in an RNTL project (GASP)

Experimentation platform: VTHD • High speed network between INRIA research centers(2.5 Gb/s) and several other research institutes • Connecting several PCs clusters, SGI O2K, and virtual reality caves • Ideal test platform for our developments • RNRT project • Several Grid computing projects • Parallel CORBA objects, • Grid computing environments and multi-protocols communication layers, • Computational servers, • Code coupling, • Virtual reality, ...

ASP Partners • ReMaP – LIP ENS LyonF. Desprez, E. Caron, P. Combes, M. Quinson, F. Suter, Ing. X, Y • ARES – INSA Lyon E. Fleury • Résédas – LORIAY. Caniou,E. Jeannot • SDRP – LIFCJ.-M. Nicod, L. Philippe, S. Contassot, F. Lombard • Physique Lyon 1, Physique ENS Lyon, MAPLIJ.-L. Barrat, V. Volpert • LST – ENS LyonG. Vidal • SRMSC NancyG. Monard • IRCOMR. Quéré,R. Sommet

Target Applications • Researchers of four different fields (chemistry, physics, electronics, geology) • Four applications with different needs and different behavior Digital Elevation Models Molecular Dynamics Microwave circuits simulation HSEP

Applications in ASP Mode Study the target applications Validate the parallel versions on server Develop client and server « glues » and adapt DIET Validate the prototype with non-specialist users Adapt DIET if necessary

MNT Binary files View angles information and coordinates of initial corresponding points Digital Elevation Models (MNT) • Stereoscopic processing: • Maximal matching between the spots of both pictures. • Elevation computation. • Geometrical constraints • Optical disparities LST

MNT server Digital Elevation Models (MNT), cont. Geologist DIET AGENT(s) Client S2 S1 Maps server LST

Digital Elevation Models (MNT), cont. • Specific needs: • Great amount of memory • Great amount of data • Visualization • ASP approach: • Computational power: • Processing high-definition pictures Ex : Pictures from SPOT Satellite < 5m • Reducing processing time Ex : Earthquake. LST

Molecular Dynamics • Simulation of atomic trajectories from molecular interactions - Hydrodynamics (velocity fields, temperature, etc.) - Mechanical properties for solids at a micro scale: • Short range interactions: - Partitioning  Good parallelism. • Differential equation solving: • Logs dumped on disk and exploited postmortem • Private and public codes k = 1…10-6 Physique Lyon 1, Physique ENS Lyon, MAPLI

Molecular Dynamics, cont. Physicist DIET AGENT(s) Client S2 S1 Application server Application server Physique Lyon 1, Physique ENS Lyon, MAPLI

Molecular Dynamics, cont. • Specific needs: • High accuracy • Large systems • Disk logs • ASP approach: • Computational power • Checkpointing mechanisms on the grid Physique Lyon 1, Physique ENS Lyon, MAPLI

X Potential Energy HyperSurface (HSEP) • Distributed computation of various points on a surface (quantum chemistry) • Existing software: • Gaussian (PSMN) • QC++ (free code) Molecularconfiguration Computedpoints SRSMC

HSEP, cont. Chemist Database of computed points DIET AGENT(s) Client DB S2 S1 Gaussian server QC++ server SRSMC

HSEP, cont. • Specific needs: • Use of a Relational DB (MySQL) storing all computation done and to be done • A Web Interface (http+PHP) links the client to the RDB and DIET • Results filtering through Python scripts • Complexity: O(N4) • ASP approach: • DB as a DIET client • Security • Coarse grain parallelism SRSMC

Microwave Circuits Simulation • Direct coupling between transportequations of Hetero-junction Bipolar Transistors and circuitsimulator for coupled microwave circuit/components design • Coupling between • Physical simulator of HBT • Circuit simulator • Thermal reduced model derived from 3D Finite Element simulation • Integrated simulator • Analysis tool, predictive and “process” oriented (co design of the circuit and the transistor devices for a given application: amplifier, mixer ...) IRCOM

Microwave Circuits Simulation, cont. DIETAGENT(s) Client S2 S1 Simulation server Sparse solver server IRCOM

Microwave Circuits Simulation, cont. • Large systems to solve • Clients look fast and efficient sparse solvers • Simulators source code may be confidential • Dedicated servers for physical simulation, reachable through DIET which provides the part of the jacobian matrix in order to build the large system to solve IRCOM

problem Parallelizedsource code source source Metacompil (CRI, Ecole des Mines Fontainebleau) DIETAGENT Client S2 S1 Compilation server Application server

Where do we start from ? • 1998-2000: ARC INRIA OURAGANTools for the resolution of large size numerical problems • Parallelization of Scilab (PVM, MPI, PBLAS, BLACS, ScaLAPACK, Pastix, NetSolve) • Use of Scilab in front of computational servers (parallel or sequential) • NetSolve optimization (data persistence, development of an environment for the evaluation of communication and computational performance) • ReMaP, Métalau, Résédas, LIFC, LaBRI

Our first view of computational servers • Ideas • Scilab as a first target application • Simplify the use of new libraries (sparse systems libraries) • Benefit from the development of software components around Grid computing • Develop a toolkit for the deployment of computational servers • First prototype developed from existing software modules • NetSolve (University of Tennessee, Knoxville) • NWS (UCSD and UTK) for the dynamic evaluation of performance Our developments on libraries (data redistribution routines, sparse solvers, out-of-core routines) • LDAP software database and CORBA for the server management

Our first goals • Add some features to NetSolve for our applications • Data-persistence on servers • Data-redistribution and parallelism between servers • Better evaluation of [routine, machine] pairs for fine grain computation • Portable database for available libraries (LDAP-based) • Get an experimentation platform for our developments • Mixed parallelism (data- and task-parallelism) • Scheduling heuristics for data-parallel tasks • Parallel algorithms for heterogeneous platforms • Performance evaluation • Server management using CORBA

Clients Agent Servers NetSolve over VTHD

NetSolve Behavior Utilisation intensive • VTHD Network • Clients: Rennes cluster (paraski) • Scheduler: NetSolve Agent (Rocquencourt) • Server : paraski26 (paraski)

Software database (distributed) AGENT AGENT Scheduler Scheduler Performancedatabase (distributed) AGENT Scheduler C, Fortran, Java S3 Batch system LocalScheduler S1 Direct connection S2 Visualization server DIET (Distributed Interactive Engineering Toolbox)

DIET Goals • Our goals: • Develop a toolbox for the deployment of ASP environments with different applications • Use as much as possible standard (and public domain) software • Obtain a high performance and scalable environment • Implement our more theoretical results in this environment (scheduling, data (re)distribution, performance evaluation, algorithms for heterogeneous platform) • Use CORBA, NWS, LDAP and our software components (SLiM and FAST) • Different applications (simulation, compilation, …) • ReMaP, ARES, Résédas, LIFC, Sun Labs (RNTL GASP) http://www.ens-lyon.fr/~desprez/DIET/

LA Hierarchical Architecture • Hierarchical architecture for scalability • Distributing information in the entire tree • plug-in schedulers • Data persistence MA MA MA MA MA Master Agent Computational server front-end Local Agent LA LA LA Direct connection

Evaluation ofDIET’s Server Invocation

S C C S C A C C C C A C S C A C C S C C C A A A A A S S S S S S S S S S S S S S S DIET AGENT(s) • Distributed set of agents for an improved scalability • Study of several connection schemes between agents (hierarchical, distributed, duplicated agents, …) and agent mapping • Tree-based scheduling algorithms with information distributed in each node in the hierarchical approach • Connection to FAST to gather information about resources and to SLiM to find the available applications • Different generic and application dependent schedulers • Corba, JXTA

Performance Evaluation • Performance evaluation of the GRID-RPC platform • Finding one (or many) efficient server(s) (computational cost of the function requested, server’s load, communication costs between the client and the server, memory capacity, …)  Performance database for the scheduler • Hard to accurately model (and understand) networks like Internet or VTHD • Need for a small response time • To be able to model applications (problems with application which execution time depends of the input data) • Accounting

C A B FAST: Fast Agent’s System Timer • NWS-based (Network Weather Service from UCSB) • Computational performance • load, memory capacity, and performance of batch queues (dynamic) • Benchmarks and modelisation of available libraries (static) • Communication performance • To be able to guess the data redistribution cost between two servers (or clients to server) as a function of the network architecture and dynamic information • Bandwidth and latency (hierarchical) • Hierarchical set of agents • Scalability problems

Request Answer Name server Forecaster Request Memory Data Storage Storage Sensor Sensor Test Availability of the System: NWS • Network Weather Service (Wolski, UCSB) • Measure the availability of resources • CPU load, bandwidth, etc. • Forecast the variation with statistics • Extensible and open • Used by many projects (Globus, NetSolve, Ninf, etc.) Client Test

Overall Architecture Client application Structural approach Run-time library Needs modeling Sys availabilities LDAP NWS Installation time Benchmarker Benchmarker

Mean error: 1% Time Modeling of DGEMM

23% Mean error: 15% Performance forecasting:Complex matrices multiplication

NWS Optimization- response time

NWS Optimization- collaboration with the scheduler Execution of a task Idle time Idle time

ACI GRID ASP Client-Server Approach for Simulation over the GRID