410 likes | 542 Views
ACI GRID ASP Client-Server Approach for Simulation over the GRID. Frédéric Desprez LIP ENS Lyon ReMaP Project. Outline. Grid RPC and ASP concepts ACI Grid ASP Target applications DIET. RNTL. INTRODUCTION.
E N D
ACI GRID ASPClient-Server Approach for Simulation over the GRID Frédéric Desprez LIP ENS Lyon ReMaP Project
Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET RNTL
INTRODUCTION • One long-term idea for Grid computing: renting computational power and memory capacity over the net using RPC • Very high potential • Need of PSEs (Problem Solving Environments ) and ASPs (Application Service Provider) • Applications will always need more and more computational power and memory capacity • (Parallel) application encapsulation • Some libraries or codes need to stay where they have been developed • Some confidential data must not travel over the net • Use of computational servers reachable through a simple interface • Still some problems • Security and fault-tolerance problems • Often application-dependent PSEs • No standards (CORBA, JAVA/JINI, sockets, …) to build the computational servers
Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET
RPC and Grid-Computing : GridRPC • One simple idea • Implement theRPC programming model over the GRID • Use computational resources available over the net • Applications that have huge computational and/or data storage needs • Task parallel programming model (synchronous and asynchronous calls) + data-parallelism on the servers themselves, mixed parallelism • Features needed • Load-balancing (resource localisation and performance evaluation, scheduling), • Simple interface, • Data distribution et migration, • Security, • Fault-tolerance, • Interoperability with other systems, …
RPC and Grid-Computing : GridRPC, cont. • Five fundamental components: • ClientProvides several user interfaces and submit requests to servers • ServerReceives clients requests and executes the software modules on their behalf • DatabaseStores the static and dynamic data about the software and hardware resources • SchedulerCatches the clients requests and takes decisions to map the tasks on the servers depending on the data stored in the database • MonitorDynamically monitors the status of computational resources and stores the obtained information in the database
Request S2 ! A, B, C Answer (C) GridRPCBig Picture Client AGENT(s) Op(C, A, B) S1 S3 S4 S2
RPC and Grid-Computing : GridRPC, cont • Middleware between portals and Grid components • Basic tools for the deployment of large scale environments (Web portals, Problem Solving Environments, Grid Toolkits, …) • Big success on several applications • Discussion in the Advanced Programming Models (APM) working group from the Global Grid Forum • GridRPC Client API proposed double A[n][n],B[n][n],C[n][n]; /* Data declaration */ dmmul(n,A,B,C); /* Local function call */ GRPC_call(handle,n,A,B,C); /* Server function call */
RPC and Grid-Computing: GridRPC: related problems • Security • Authentication and Authorization • Data transfers • Fault-tolerance • Servers or agent(s) • Interoperability • Problem description • API • Data management • Data persistence • Data (re)distribution • Garbage collection • Check-pointing • Fast parallel IO • Scalability • Distributed servers/agent(s) • User assistance/PSE • Automatic choice of solutions • Resource localization • Hardware and software • Scheduling • Semi-static scheduling • On-line scheduling of off-line scheduling • Sharing servers between users • Security problems • Lock/unlock, data consistency, race conditions • Performance evaluation • Heterogeneity • Batch systems • Data visualization • Scalability problems • Dynamic platform • Resource localization • Agents/servers mapping
Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET
ASP Project Overview • Multi-disciplinary project • Rent computational power and memory capacity over the network • Four applications with different needs and different behavior • Develop a toolbox for the deployment of application servers • Study the impact of these applications on our environment and adapt it to these new needs • A highly hierarchical and heterogeneous network (VTHD + networks of the labs involved in the project) • A software architecture developed in an RNTL project (GASP)
Experimentation platform: VTHD • High speed network between INRIA research centers(2.5 Gb/s) and several other research institutes • Connecting several PCs clusters, SGI O2K, and virtual reality caves • Ideal test platform for our developments • RNRT project • Several Grid computing projects • Parallel CORBA objects, • Grid computing environments and multi-protocols communication layers, • Computational servers, • Code coupling, • Virtual reality, ...
Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET
Target Applications • Researchers of four different fields (chemistry, physics, electronics, geology) • Four applications with different needs and different behavior Digital Elevation Models Molecular Dynamics Microwave circuits simulation HSEP
MNT Binary files View angles information and coordinates of initial corresponding points Digital Elevation Models (MNT) • Stereoscopic processing: • Maximal matching between the spots of both pictures. • Elevation computation. • Geometrical constraints • Optical disparities LST
MNT server Digital Elevation Models (MNT), cont. Geologist DIET AGENT(s) Client S2 S1 Map server LST
Digital Elevation Models (MNT), cont. • Specific needs: • Great amount of memory • Great amount of data • Visualization • ASP approach: • Computational power: • Processing high-definition pictures Ex : Pictures from SPOT Satellite < 5m • Reducing processing time Ex : Earthquake. LST
Microwave Circuits Simulation • Direct coupling between transportequations of Hetero-junction Bipolar Transistors and circuitsimulator for coupled microwave circuit/components design • Coupling between • Physical simulator of HBT • Circuit simulator • Thermal reduced model derived from 3D Finite Element simulation • Integrated simulator • Analysis tool, predictive and “process” oriented (co design of the circuit and the transistor devices for a given application: amplifier, mixer ...) IRCOM
Microwave Circuits Simulation, cont. DIETAGENT(s) Client S2 S1 Simulation server Sparse solver server IRCOM
Microwave Circuits Simulation, cont. • Large systems to solve • Clients look fast and efficient sparse solvers • Simulators source code may be confidential • Dedicated servers for physical simulation, reachable through DIET which provides the part of the jacobian matrix in order to build the large system to solve IRCOM
X Potential Energy HyperSurface (HSEP) • Distributed computation of various points on a surface (quantum chemistry) • Existing software: • Gaussian (PSMN) • QC++ (public domain) Molecularconfiguration Computedpoints SRSMC
HSEP, cont. Chemist Database of computed points DIET AGENT(s) Client DB S2 S1 Gaussian server QC++ server SRSMC
HSEP, cont. • Specific needs: • Use of a Relational DB (MySQL) storing all computation done and to be done • A Web Interface (http+PHP) links the client to the RDB and DIET • Results filtering through Python scripts • Complexity: O(N4) • ASP approach: • DB as a DIET client • Security • Coarse grain parallelism SRSMC
problem Parallelizedsource code source source Metacompil (CRI, Ecole des Mines Fontainebleau) DIETAGENT Client S2 S1 Compilation server Application server
Outline • Grid RPC and ASP concepts • ACI Grid ASP • Target applications • DIET
Where do we start from ? • 1998-2000: ARC INRIA OURAGANTools for the resolution of large size numerical problems • Parallelization of Scilab (PVM, MPI, PBLAS, BLACS, ScaLAPACK, Pastix, NetSolve) • Use of Scilab in front of computational servers (parallel or sequential) • NetSolve optimization (data persistence, development of an environment for the evaluation of communication and computational performance) • ReMaP, Métalau, Résédas, LIFC, LaBRI
Our first view of computational servers • Ideas • Scilab as a first target application • Simplify the use of new libraries (sparse systems libraries) • Benefit from the development of software components around Grid computing • Develop a toolkit for the deployment of computational servers • Experimentation platform for our developments • Mixed parallelism (data- and task-parallelism) • Scheduling heuristics for data-parallel tasks • Parallel algorithms for heterogeneous platforms • Performance evaluation • Server management using CORBA
DIET Goals • Our goals: • Develop a toolbox for the deployment of ASP environments with different applications • Use as much as possible standard (and public domain) software • Obtain a high performance and scalable environment • Implement our more theoretical results in this environment (scheduling, data (re)distribution, performance evaluation, algorithms for heterogeneous platform) • Use of CORBA, NWS, LDAP, Grid Engine and our software components (SLiM and FAST) • Different applications (simulation, compilation, …) • ReMaP, ARES, Résédas, LIFC, Sun Labs Europe (RNTL GASP) http://www.ens-lyon.fr/~desprez/DIET/
Clients Agent Servers NetSolve over VTHD
LA Hierarchical Architecture • Hierarchical architecture for scalability • Distributing information in the entire tree • plug-in schedulers • Data persistence MA MA MA MA MA Master Agent Computational server front-end Local Agent LA LA LA Direct connection
S C C S C A C C C C A C S C A C C S C C C A A A A A S S S S S S S S S S S S S S S DIET AGENT(s) • Distributed set of agents for an improved scalability • Study of several connection schemes between agents (hierarchical, distributed, duplicated agents, …) and agent mapping • Tree-based scheduling algorithms with information distributed in each node in the hierarchical approach • Connection to FAST to gather information about resources and to SLiM to find the available applications • Different generic and application dependent schedulers • Corba, JXTA
Performance Evaluation • Performance evaluation of the GRID-RPC platform • Finding one (or many) efficient server(s) (computational cost of the function requested, server’s load, communication costs between the client and the server, memory capacity, …) Performance database for the scheduler • Hard to accurately model (and understand) networks like Internet or VTHD • Need for a small response time • To be able to model applications (problems with application which execution time depends of the input data) • Accounting
C A B FAST: Fast Agent’s System Timer • NWS-based (Network Weather Service, UCSB) • Computational performance • load, memory capacity, and performance of batch queues (dynamic) • Benchmarks and modelisation of available libraries (static) • Communication performance • To be able to guess the data redistribution cost between two servers (or clients to server) as a function of the network architecture and dynamic information • Bandwidth and latency (hierarchical)
Mean error: 1% Time Modeling of DGEMM
23% Mean error: 15% Performance forecasting:Complex matrices multiplication
NWS Optimization- collaboration with the scheduler Execution of a task Idle time Idle time
SCHEDULER(s) Scheduling • Shortest Execution Time • Other algorithms possible (economical model, dead-line scheduling, classical problem of on-line scheduling) • Request sequencing • On-line scheduling of static scheduling • Mono (or distributed)-agent(s) for Ninf and NetSolve • Hierarchy of agents for DIET (local scheduling) • Model the cost of the scheduling itself • Simulation using SimGrid (UCSD) • Dynamic deployment
Short-Term Work • Security • Authentication and Authorization • Data transfers • Fault-tolerance • Servers or agents • Interoperability • Problem description • API • Data management • Data persistence • Data (re)distribution • Garbage collection • Check-pointing • Fast parallel IO • Scalability • Hierarchy of servers/agents • User assistance/PSE • Automatic choice of solutions • Resource localization • Hardware and software • Scheduling • Semi-static scheduling • On-line scheduling of off-line scheduling • Sharing servers between users • Security problems • Lock/unlock, data consistency, race conditions • Performance evaluation • Heterogeneity • Batch systems • Data visualization • Scalability problems • Dynamic platform • Resource localization • Agents/servers mapping
Conclusion and Future Work • Development of a set of portable (and open-source) set of tools to build ASP environments • Multi-applications, multi-platforms and multi-interfaces • Use of developments made in other projects (NWS, NetSolve, SimGrid, Sun Grid Engine, Paris, IBP) • Concentration of several problems like resource localization, scheduling, agent deployment, algorithmic for heterogeneous platforms, performance analysis • Things to been explored (quickly!): security, fault tolerance • Find new applications … not only number crunching ones (ex: Metacompil) • Support of the Grid RPC standard proposed by NetSolve and Ninf teams (SC2002 ?) http://www.ens-lyon.fr/~desprez/DIET/