240 likes | 257 Views
Explore the efficient and scalable Grid Application Monitoring OCM-G system, including components, services, request types, and message routing. Learn about the benefits of this system in handling multiple applications and users.
E N D
An Application Monitoring System in Grid Bartosz Baliś Tomasz Szepieniec
AGENDA • PART I Monitoring – introduction (B. Baliś) • PART II Concepts and problems in Grid application monitoring (T. Szepieniec)
PART I - OUTLINE • Application Monitoring • OMIS • Grid Monitoring System – OCM-G • Instrumentation
Application monitoring • Monitor = obtain information on or manipulate target application • e.g. read status of application’s processes, suspend application, read / write memory, etc. • Monitoring module needed by tools • Debuggers • Performance analyzers • Visualizers • ...
Monitoring – integrated module • Monitoring module integrated with GUI • Usual case
Monitoring – autonomous system • Separate monitoring system • Tool / Monitor interface – OMIS
Monitoring system – benefits • Modularity • GUI development separated from monitoring module development • Single monitoring system • multiple tools do not conflict each other • coordination of access to shared objects • enables tool interoperability
Interoperability • Multiple tools • Common monitoring system • Interoperability • cooperation • e.g. debugger + visualizer
OMIS • Universal tool-mon.sys. interface • Target system view • hierarchical set of objects • nodes, processes, threads • grid: additional objects – sites • objects identified by tokens, e.g. n_1, p_1, etc. • Three types of services • information services • manipulation services • event services
OMIS – services • Information services • obtain information on target system • e.g. node_get_info = obtain information on nodes in the target system • Manipulation services • perform manipulations on the target system • e.g. thread_stop = stop specified threads • Event services • detect events in the target system • e.g. thread_started_libcall = detect infocations of specified functions
OMIS – requests • Services can be combined into monitoring requests Two types of requests: • Unconditional requests • to be executed immediately • executed only once • Conditional requests • to execute actions whenever event occurs • actions can be executed multiple time
OMIS – unconditional requests :thread_stop(t_1) Actions Operands = stop thread t_1
OMIS – conditional requests thread_started_libcall(t_1, „MPI_Send”): counter_inc(c_1) Event Operands Actions = whenever thread t_1 invokes MPI_Send, increment counter c_1
Grid-enabled OMIS-compliant Monitoring System – OCM-G • Scalable • distributed • decentralized • Efficient • local buffers Three types of components • local monitors (LM) • service managers (SM) • application monitors (AM)
Service Managers and Local Monitors • Service Managers • one or more in the system • request distribution • reply collection • Local Monitors • one per node • handle local objects • actual execution of requests
Application monitors • Embedded in applications • Handle some actions locally • buffering data • filtering of instrumentation • monitoring requests • E.g. REQ: read variable a, REP: value of a • asynchronous • no OS mechanisms involvedion
PART II – OUTLINE Concepts and problems in Grid application monitoring • OCM-G as Grid service • OCM-G component discovery • Handling Multiple Applications • OMIS message routing • Application start-up
OCM-G as Grid service • Permanently running in Grid • Global service • multiple applications • multiple users based on Globus users • multiple tools
OCM-G component discovery • Use cases • Assemble individual parts into a single monitoring system • Enabling attachment of application processes and tools • External mechanisms based on • local file system • MDS • R-GMA • ...
Handling Multiple Applications • Create „virtual” monitoring system • One SM becomes MainSM for each application • which one? • Expansion and localization mechanism works in „virtual” mode
OMIS message routing Two options: • route via Main SM(option 1) • SM communicate directly(option 2) Where localization info should be stored and how updated?
Application start-up –requirements • Application started as usual (e.g. PORTAL,globusrun) • Independent of communication protocol the application uses • Use GRAM to placeapplication processes where OCM-G is enabled
Application start-up – HOWTO • Monitoring parametersare put in command line • Initial OMIS requests can also be put in command line (e.g. breakpoints) Obligatory starting parameters • application token chosen apriori • MainSM token should be known apriori (we work on how to avoid this)
Conclusion • OMIS as communication protocol • Autonomous application monitoring system • OCM-G use Globus services • Normal application startup can be achieved Some problems are still to be addressed...