1 / 54

ACI GRID ASP Client-Server Approach for Simulation over the GRID

ACI GRID ASP Client-Server Approach for Simulation over the GRID. Frédéric Desprez LIP ENS Lyon ReMaP Project. Outline. Grid RPC and ASP concepts ACI Grid ASP Target applications DIET History An ASP platform. RNTL. INTRODUCTION.

yanni
Download Presentation

ACI GRID ASP Client-Server Approach for Simulation over the GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACI GRID ASPClient-Server Approach for Simulation over the GRID Frédéric Desprez LIP ENS Lyon ReMaP Project

  2. Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform RNTL

  3. INTRODUCTION • Future of parallel computing: distributed and heterogeneous • Metacomputing/Grid Computing=Using distributed sets of hetereogeneous platforms • Network Computing today! • SMP clusters with very fast processors, high performance (and low cost) networks, (almost) mature software • (Too) many projects • Target: many applications in many different fields (not only number crunching or embarrasingly parallel ones) • Some important problems: • algorithmic (data distribution, load-balancing, latency tolerant algorithms, ...) • system (administration, fault-tolerance, security, resource localisation, …) • software (interoperability, code re-use, ...) • Global Grid forum

  4. INTRODUCTION, cont. • One long-term idea for Grid computing: renting computational power and memory capacity over the net • Very high potential • Need of PSEs (Problem Solving Environments ) and ASPs (Application Service Provider) • Applications will always need more and more computational power and memory capacity • Some libraries or codes need to stay where they have been developed • Some confidential data must not travel over the net • Use of computational servers reachable through a simple interface • Still difficult to use for non-specialists • Almost no transparency • Security and fault-tolerance problems are generally not addressed enough • Often application-dependent PSEs • No standards (CORBA, JAVA/JINI, sockets, …) to build the computational servers

  5. Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform

  6. RPC and Grid-Computing : GridRPC • One simple idea • Implement the (old!) RPC programming model over the GRID • Use computational resources available over the net • Applications that have huge computational and/or data storage needs • Task parallel programming model (synchronous and asynchronous calls) + data-parallelism on the servers themselves, mixed parallelism • Features needed • Load-balancing (resource localisation and performance evaluation, scheduling), • Simple interface, • Data distribution et migration, • Security, • Fault-tolerance, • Interoperability with other systems, …

  7. RPC and Grid-Computing : GridRPC, cont. • Five fundamental components: • ClientProvides several user interfaces and submit requests to servers • ServerReceives clients requests and executes the software modules on their behalf • DatabaseStores the static and dynamic data about the software and hardware resources • SchedulerCatches the clients requests and takes decisions to map the tasks on the servers depending on the data stored in the database • MonitorDynamically monitors the status of computational resources and stores the obtained information in the database

  8. Request S2 ! A, B, C Answer (C) ASP Scheme Client AGENT(s) Op(C, A, B) S1 S3 S4 S2

  9. RPC and Grid-Computing : GridRPC, cont • Middleware between portals and Grid components • Basic tools for the deployment of large scale environments (Web portals, Problem Solving Environments, Grid Toolkits, …) • Big success on several applications • Discussion in the Advanced Programming Models (APM) working group from the Global Grid Forum • GridRPC Client API proposed SCIRun torso defibrillator application – Chris Johnson, U. of Utah

  10. RPC and Grid-Computing: GridRPC: related problems • Security • Authentication and Authorization • Data transfers • Fault-tolerance • Servers or agents • Interoperability • Problem description • API • Data management • Data persistence • Data (re)distribution • Garbage collection • Check-pointing • Fast parallel IO • Scalability • Hierarchy of servers/agents • User assistance/PSE • Automatic choice of solutions • Resource localization • Hardware and software • Scheduling • On-line scheduling of off-line scheduling • Sharing servers between users • Security problems • Lock/unlock, data consistency, race conditions • Performance evaluation • Heterogeneity • Batch systems • Data visualization • Scalability problems • Dynamic platform • Resource localization • Agents/servers mapping

  11. RPC and Grid-Computing : GridRPC, cont • Some available tools • NetSolve(University of Tennessee, USA) • Ninf and OmniRPC (Japan) • DIET(ReMaP, ARES, LIFC, Résédas) • Based on CORBA • NEOS, Meta-NEOS(Argonne National Lab., USA) • Combinatorial optimization problems • RCS(ETH Zürich) • ScaLAPACK Servers • NIMROD, NIMROD-G(Monash University, Australia)

  12. Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform

  13. Project Overview • Multi-disciplinary project • Rent computational power and memory capacity over the net • Four applications with different needs and different behavior • Develop a toolbox for the deployment of application servers • Study the impact of these applications on our environment and adapt it to these new needs • A highly hierarchical and heterogeneous network (VTHD + networks of the labs involved in the project) • A software architecture developed in an RNTL project (GASP)

  14. Experimentation platform: VTHD • High speed network between INRIA research centers(2.5 Gb/s) and several other research institutes • Connecting several PCs clusters, SGI O2K, and virtual reality caves • Ideal test platform for our developments • RNRT project • Several Grid computing projects • Parallel CORBA objects, • Grid computing environments and multi-protocols communication layers, • Computational servers, • Code coupling, • Virtual reality, ...

  15. ASP Partners • ReMaP – LIP ENS LyonF. Desprez, E. Caron, P. Combes, M. Quinson, F. Suter, Ing. X, Y • ARES – INSA Lyon E. Fleury • Résédas – LORIAY. Caniou,E. Jeannot • SDRP – LIFCJ.-M. Nicod, L. Philippe, S. Contassot, F. Lombard • Physique Lyon 1, Physique ENS Lyon, MAPLIJ.-L. Barrat, V. Volpert • LST – ENS LyonG. Vidal • SRMSC NancyG. Monard • IRCOMR. Quéré,R. Sommet

  16. Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform

  17. Target Applications • Researchers of four different fields (chemistry, physics, electronics, geology) • Four applications with different needs and different behavior Digital Elevation Models Molecular Dynamics Microwave circuits simulation HSEP

  18. Applications in ASP Mode Study the target applications Validate the parallel versions on server Develop client and server « glues » and adapt DIET Validate the prototype with non-specialist users Adapt DIET if necessary

  19. MNT Binary files View angles information and coordinates of initial corresponding points Digital Elevation Models (MNT) • Stereoscopic processing: • Maximal matching between the spots of both pictures. • Elevation computation. • Geometrical constraints • Optical disparities LST

  20. MNT server Digital Elevation Models (MNT), cont. Geologist DIET AGENT(s) Client S2 S1 Maps server LST

  21. Digital Elevation Models (MNT), cont. • Specific needs: • Great amount of memory • Great amount of data • Visualization • ASP approach: • Computational power: • Processing high-definition pictures Ex : Pictures from SPOT Satellite < 5m • Reducing processing time Ex : Earthquake. LST

  22. Molecular Dynamics • Simulation of atomic trajectories from molecular interactions - Hydrodynamics (velocity fields, temperature, etc.) - Mechanical properties for solids at a micro scale: • Short range interactions: - Partitioning  Good parallelism. • Differential equation solving: • Logs dumped on disk and exploited postmortem • Private and public codes k = 1…10-6 Physique Lyon 1, Physique ENS Lyon, MAPLI

  23. Molecular Dynamics, cont. Physicist DIET AGENT(s) Client S2 S1 Application server Application server Physique Lyon 1, Physique ENS Lyon, MAPLI

  24. Molecular Dynamics, cont. • Specific needs: • High accuracy • Large systems • Disk logs • ASP approach: • Computational power • Checkpointing mechanisms on the grid Physique Lyon 1, Physique ENS Lyon, MAPLI

  25. X Potential Energy HyperSurface (HSEP) • Distributed computation of various points on a surface (quantum chemistry) • Existing software: • Gaussian (PSMN) • QC++ (free code) Molecularconfiguration Computedpoints SRSMC

  26. HSEP, cont. Chemist Database of computed points DIET AGENT(s) Client DB S2 S1 Gaussian server QC++ server SRSMC

  27. HSEP, cont. • Specific needs: • Use of a Relational DB (MySQL) storing all computation done and to be done • A Web Interface (http+PHP) links the client to the RDB and DIET • Results filtering through Python scripts • Complexity: O(N4) • ASP approach: • DB as a DIET client • Security • Coarse grain parallelism SRSMC

  28. Microwave Circuits Simulation • Direct coupling between transportequations of Hetero-junction Bipolar Transistors and circuitsimulator for coupled microwave circuit/components design • Coupling between • Physical simulator of HBT • Circuit simulator • Thermal reduced model derived from 3D Finite Element simulation • Integrated simulator • Analysis tool, predictive and “process” oriented (co design of the circuit and the transistor devices for a given application: amplifier, mixer ...) IRCOM

  29. Microwave Circuits Simulation, cont. DIETAGENT(s) Client S2 S1 Simulation server Sparse solver server IRCOM

  30. Microwave Circuits Simulation, cont. • Large systems to solve • Clients look fast and efficient sparse solvers • Simulators source code may be confidential • Dedicated servers for physical simulation, reachable through DIET which provides the part of the jacobian matrix in order to build the large system to solve IRCOM

  31. problem Parallelizedsource code source source Metacompil (CRI, Ecole des Mines Fontainebleau) DIETAGENT Client S2 S1 Compilation server Application server

  32. Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET • History • An ASP platform

  33. Where do we start from ? • 1998-2000: ARC INRIA OURAGANTools for the resolution of large size numerical problems • Parallelization of Scilab (PVM, MPI, PBLAS, BLACS, ScaLAPACK, Pastix, NetSolve) • Use of Scilab in front of computational servers (parallel or sequential) • NetSolve optimization (data persistence, development of an environment for the evaluation of communication and computational performance) • ReMaP, Métalau, Résédas, LIFC, LaBRI

  34. Our first view of computational servers • Ideas • Scilab as a first target application • Simplify the use of new libraries (sparse systems libraries) • Benefit from the development of software components around Grid computing • Develop a toolkit for the deployment of computational servers • First prototype developed from existing software modules • NetSolve (University of Tennessee, Knoxville) • NWS (UCSD and UTK) for the dynamic evaluation of performance Our developments on libraries (data redistribution routines, sparse solvers, out-of-core routines) • LDAP software database and CORBA for the server management

  35. Our first goals • Add some features to NetSolve for our applications • Data-persistence on servers • Data-redistribution and parallelism between servers • Better evaluation of [routine, machine] pairs for fine grain computation • Portable database for available libraries (LDAP-based) • Get an experimentation platform for our developments • Mixed parallelism (data- and task-parallelism) • Scheduling heuristics for data-parallel tasks • Parallel algorithms for heterogeneous platforms • Performance evaluation • Server management using CORBA

  36. Clients Agent Servers NetSolve over VTHD

  37. NetSolve Behavior Utilisation intensive • VTHD Network • Clients: Rennes cluster (paraski) • Scheduler: NetSolve Agent (Rocquencourt) • Server : paraski26 (paraski)

  38. Software database (distributed) AGENT AGENT Scheduler Scheduler Performancedatabase (distributed) AGENT Scheduler C, Fortran, Java S3 Batch system LocalScheduler S1 Direct connection S2 Visualization server DIET (Distributed Interactive Engineering Toolbox)

  39. DIET Goals • Our goals: • Develop a toolbox for the deployment of ASP environments with different applications • Use as much as possible standard (and public domain) software • Obtain a high performance and scalable environment • Implement our more theoretical results in this environment (scheduling, data (re)distribution, performance evaluation, algorithms for heterogeneous platform) • Use CORBA, NWS, LDAP and our software components (SLiM and FAST) • Different applications (simulation, compilation, …) • ReMaP, ARES, Résédas, LIFC, Sun Labs (RNTL GASP) http://www.ens-lyon.fr/~desprez/DIET/

  40. LA Hierarchical Architecture • Hierarchical architecture for scalability • Distributing information in the entire tree • plug-in schedulers • Data persistence MA MA MA MA MA Master Agent Computational server front-end Local Agent LA LA LA Direct connection

  41. Evaluation ofDIET’s Server Invocation

  42. S C C S C A C C C C A C S C A C C S C C C A A A A A S S S S S S S S S S S S S S S DIET AGENT(s) • Distributed set of agents for an improved scalability • Study of several connection schemes between agents (hierarchical, distributed, duplicated agents, …) and agent mapping • Tree-based scheduling algorithms with information distributed in each node in the hierarchical approach • Connection to FAST to gather information about resources and to SLiM to find the available applications • Different generic and application dependent schedulers • Corba, JXTA

  43. Performance Evaluation • Performance evaluation of the GRID-RPC platform • Finding one (or many) efficient server(s) (computational cost of the function requested, server’s load, communication costs between the client and the server, memory capacity, …)  Performance database for the scheduler • Hard to accurately model (and understand) networks like Internet or VTHD • Need for a small response time • To be able to model applications (problems with application which execution time depends of the input data) • Accounting

  44. C A B FAST: Fast Agent’s System Timer • NWS-based (Network Weather Service from UCSB) • Computational performance • load, memory capacity, and performance of batch queues (dynamic) • Benchmarks and modelisation of available libraries (static) • Communication performance • To be able to guess the data redistribution cost between two servers (or clients to server) as a function of the network architecture and dynamic information • Bandwidth and latency (hierarchical) • Hierarchical set of agents • Scalability problems

  45. Request Answer Name server Forecaster Request Memory Data Storage Storage Sensor Sensor Test Availability of the System: NWS • Network Weather Service (Wolski, UCSB) • Measure the availability of resources • CPU load, bandwidth, etc. • Forecast the variation with statistics • Extensible and open • Used by many projects (Globus, NetSolve, Ninf, etc.) Client Test

  46. Overall Architecture Client application Structural approach Run-time library Needs modeling Sys availabilities LDAP NWS Installation time Benchmarker Benchmarker

  47. Mean error: 1% Time Modeling of DGEMM

  48. 23% Mean error: 15% Performance forecasting:Complex matrices multiplication

  49. NWS Optimization- response time

  50. NWS Optimization- collaboration with the scheduler Execution of a task Idle time Idle time

More Related