360 likes | 726 Views
A Framework for Collaborative Distributed Simulation over the Grid. Stephen John Turner Parallel & Distributed Computing Centre Nanyang Technological University Singapore. Outline. Background Distributed Simulation Grid Computing Motivation Research Challenges
E N D
A Framework for Collaborative Distributed Simulation over the Grid Stephen John Turner Parallel & Distributed Computing Centre Nanyang Technological University Singapore
Outline • Background • Distributed Simulation • Grid Computing • Motivation • Research Challenges • HLA-based Distributed Simulation • Grid Services and Service Discovery • Load Management System • Grid Enabled HLA/RTI • Conclusions Brunel
Distributed Simulation • Provides a way of linking simulation components (federates) of various types at possibly different locations to create a common virtual environment (federation) Brunel
Example Application Areas • Battlefield Simulation • Linking different types of forces at multiple physical locations to create a realistic and complex virtual world • Supply Chain Simulation • Managing material and information flow, from manufacturers through distributors to customers • Air Traffic Control • Simulating airports and airspace sectors to provide faster than real-time simulation for “what-if” analysis • Multi-player Internet Games • Involving massive multi-player (~10,000) virtual world Brunel
Federation SOM SOM SOM SOM SOM SOM SOM SOM FOM SOM HLA Rules (Federations) HLA Rules (Federates) SimulationSurrogates Passive Viewers Simulations Interface FED Run-Time Infrastructure (RTI) Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management High Level Architecture Brunel
High Level Architecture • Features of High Level Architecture • Each federate has a simulation object model (SOM) defining the data to be shared with other federates allowing reuse in different federations • The federation (set of federates) has a common federation object model (FOM) • HLA supports distributed simulations linking the federates of a federation over a LAN or the Internet • Time Management can be used to ensure the correct ordering of events • HLA is an IEEE (1516) and OMG standard Brunel
Grid Computing • Grid technology is the next step in the evolution of computing, enabling new forms of collaboration through the seamless sharing of distributed computing and data resources Communities can share geographically distributed resources for their common purpose Brunel
Grid Computing Web Services Grid Services OGSA OGSI Globus Toolkit Brunel
Motivation • Collaborative Simulation Development • The development of complex simulations usually requires collaborative effort from analysts with different domain knowledge and expertise, possibly at different locations • Sharing of Computing Resources • Simulation systems often require huge computing resources and the participants in the simulation and/or data sets required may also be geographically distributed Brunel
Motivation • HLA-based Distributed Simulation on the Grid • HLA defines a standard for reuse and interoperability • Grid technologies enable collaboration and the use of distributed computing resources • Collaborative • Distributed • Complex & Multi-dimensional Brunel
Resource Managem’t Semantic Interfaces Policies Workflow Security Service/ModelDiscovery Service/ModelComposition Execution Simulation Life Cycle Brunel
Research Challenges • Service/Model Discovery • Based on requirements, “suitable” component models are selected to form an overall simulation • Research Issues • How are simulation models registered as grid services • How are simulation models discovered? • How are the interfaces defined? • Are the simulation models HLA compliant? • Do they conform to any standard reference models (e.g. HLA-CSPIF)? Brunel
Research Challenges • Service/Model Composition • Checking semantic interoperability between individual component simulation models from different sources • Research Issues • Can the output of one simulation model feed into the input of another? • How is the work flow of the configuration described? • What are the mechanisms for verifying the correctness of the simulation? Brunel
Research Challenges • Security • Simulation partners should be allowed to specify selective access to their simulation models • Research Issues • Does a user have access to a particular simulation model or data? • Can a user selectively share sensitive data with different partners? • Does the simulation model originate from a trusted partner? • Must the model be executed on a particular resource? Brunel
Research Challenges • Execution • Simulation partners may obtain computing resources from the Grid to supplement their needs • Research Issues • How can the different simulation runs be partitioned onto the available computing resources? • What mechanisms should be used for scheduling and load management of simulations on the Grid? • What kind of fault tolerance mechanisms are required? Brunel
Main work Security Service/ModelDiscovery Service/ModelComposition Execution Simulation Life Cycle Resource Managem’t Semantic Interfaces Policies Workflow Brunel
RTI RTI RTI Model Factory Model Factory RTI RTI federate federate federate federate federate HLA-based Distributed Simulation • Discovery and Composition of Models • Discovery of Resources • Management of Simulation Execution Brunel
5 1 2 4 3 Grid Services and Service Discovery • Query Index Service for RTI Service handle for federation • Create RtiExec if necessary and get endpoint used by RtiExec • Query Index Service for Federate Factory Service handle • Create Federate Service and Federate Process • Federate Processes join federation Brunel
4a 5 4 3 Grid Services and Service Discovery • Query Index Service for Federate Factory Service handle • Create Federate Service and Federate Process • 4a.Federate Service can query Index Service for RtiExec endpoint • 5. Federate Processes join federation Brunel
Load Management System (LMS) SimulationSurrogates Passive Viewers Simulations Interface SimulationSurrogates Interface Passive Viewers Simulations Run-Time Infrastructure (RTI) Interface Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management Run-Time Infrastructure (RTI) Federation Management Declaration Management Object Management Ownership Management Time Management Data Distribution Management Load Management System • Use Grid software for • Authentication, • Resource Discovery, Allocation & Monitoring, and • Facilitating Federate Migration Brunel
Load Management System (LMS) Load Management System (LMS) Load Management System (LMS) federate federate federate federate federate federate federate federate federate federate federate federate federate federate High Speed Myrinet Switch Load Management System (LMS) Load Management System (LMS) Load Management System Resource Discovery Allocation & Monitoring Globus Run Time Infrastructure Brunel
LMS Simulation Code FederateAmbassador LMClient RTIambassador federate RTI Shared Data SimKernel • Simulation code extended with two interfaces: • One for communicating with Runtime Infrastructure (RTI) • One for communicating with Load Management System (LMS) Brunel
LM Sub- Model Sub- Model Sub- Model LMClient LMClient LMClient federate federate federate SIMKernel SIMKernel SIMKernel RTI map SimKernel Design Implementation Execution Brunel
Federate • Each federate contains two threads: (SimKernel) and load management thread (LMClient) • SimKernel processes simulation events as defined by the user and communicates with RTI • LMClient works with Load Manager (LM) to perform federate migration • receive instruction from LM • stop SimKernel • get SimKernel execution state • transfer SimKernel configuration and execution state Brunel
Load Manager • Load Manager • Constantly monitors and collects load information of each individual participating computing node • Runs load balancing algorithm to determine which federate should migrate from which host to which destination • Communicates with the LMClients at both the source and destination hosts until migration succeeds Brunel
Migration Approaches • Federation wide synchronization federate federate federate Federation-Wide Save Federate Migration Federation-Wide Restore Costly Operation! Brunel
federate federate federate Migration Approaches • Communication among federates: • Messages may be lost in transit during migration publish subscribe msg network resign join subscribe subscribe unsubscribe Brunel
Our Approach • We developed an algorithm aiming to: • Provide transparent migration, and • Minimize the migration overhead • Run two instances of the migrating federate until event integrity is ensured • No synchronization or FTP communication is required • Implementation is specific to federates based on SimKernel Brunel
Federate Migration migrating federate sendOutgoingEvents returnStatus resignFederationExec suspend missingMsg receivedInteraction flushQueueRequest receivedInteraction collect returnStatus LMClient @source Req_migrate migrationSucceeded notifyMissingMsg returnInformation returnInformation requestInformation RTI Load Manager joinFederation pub/sub Interaction flushQueueRequest receivedInteraction Req_migrate getMsgCount recvMsgCount LMClient @destination resume restore new restarting federate Latency period Brunel
Experimental Results Brunel
Resource RtiExec FedExec1…m Proxies… Grid Enabled HLA/RTI Client 1 Client 1 Grid Network … … Client n Client n Federation 1 Federation m Brunel
Design Grid Services: indexing, discovery, resource management, monitoring services … Grid Services Globus Proxy Simulation Code Proxies & Federates Grid-enabled API HLA API Grid-enabled HLA API HLA API Globus RTI on LAN Globus Grid Network Client Resource Brunel
Discussion • Advantages • Avoids some firewall issues as client communicates with proxy via grid services • Client application code can run on heterogenous platforms • Provides easy migration of client code, proxy does not need to be migrated • Disadvantages • Overhead of communication as all simulation events use grid services Brunel
Conclusions • Work Done: • Developed a simple prototype using Globus for resource discovery, allocation and federate deployment (DS-RT ’02) • Developed SimKernel framework to allow modeler to concentrate on the simulation, rather than implementation (DS-RT ’03) • Developed a federate migration protocol without using federation synchronization (ICCS ’04) • Developed Grid Service and Service Discovery Framework (submitted to DS-RT ‘04) Brunel
Conclusions • Future Work: • Service/model discovery • Service/model composition • Grid workflow languages • Grid enabled HLA/RTI • Performance measurement • Alternative communication mechanisms • Migration and fault tolerance • Integration of sub-projects • Convert to GT4 (WS-RF) Brunel
Thank you for your attention! Questions & Answers While the HLA defines a standard for the construction of large-scale distributed simulations, Grid technologies enable collaboration and the use of distributed computing resources, while also facilitating access to geographically distributed data sets