1 / 22

OMIS Approach to Grid Application Monitoring

OMIS Approach to Grid Application Monitoring. Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller. AGENDA. Introduction Monitoring architecture sensors (local monitors, application monitors) service managers Performance efficient data gathering

brie
Download Presentation

OMIS Approach to Grid Application Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller

  2. AGENDA • Introduction • Monitoring architecture • sensors (local monitors, application monitors) • service managers • Performance • efficient data gathering • scalability of grid-scale monitoring • Producer / consumer communication protocol • Comparison to DATAGRID • Experience • Conclusion

  3. Introduction • Need for monitoring applications • improve performance • localize bugs • For these purposes – specialized tools needed • debuggers, performance analyzers, visualizers, etc. • Tools composed of two modules • user interface • monitoring module

  4. Introduction (cont’d) • Main issues of monitoring on Grid • scale of Grid enormous • many applications, many users, high distribution, high heterogeneity • simply porting existing environments not sufficient! • A solution: • underlying universal monitoring system • well defined interface to tools • Experience with OMIS / OCM: PVM  MPI, port of tools • next step – move to Grid?

  5. Monitoring architecture • Compliance with GMA (Grid Monitoring Architecture) • producer / consumer model • Sensors – producers of performance data • Tools – consumers of the data • Direct communication between producers and consumers • Producers located via e.g. a directory service

  6. Sensors • Collect performance data from applications • Two types of sensors • local monitors (process sensors) • application monitors

  7. Sensors (cont’d) • Local monitors • one per node • collect data only from processes on this node • publish themselves in the directory service • Application monitors • embedded parts of applications • collect data on various events, e.g. function calls • may improve efficiency and portability • interact with local monitors

  8. Monitoring Architecture

  9. Service managers • Tool + local monitors – one consumer, multiple producers • Intermediate entity: service manager • handles requests coming from a tool • splits them into sub-requests for local monitors • collects replies from local monitors • assembles them into a single reply for the tool • Both producer (of data for tools) and consumer (of data from local monitors) • Offers the functionality of local monitors but on a per-application basis

  10. Application Monitors • Part of the monitoring system embedded in the application’s processes • have acces to the application address space! • Many possible usages • efficient data gathering and storing • may take over some of the local monitor’s tasks • may be used to dynamically load monitoring extensions • even more for multithreaded applications

  11. Application Monitors – debugging example • A debugger wants to access a process’ address space • Standard system mechanisms: ptrace, /proc • /proc more powerful yet platfom-dependant • synchronous control • Via application monitors  request from the debugger to access the data • portable, asynchronous • question: how to ensure that application monitors are not corrupted by the application?

  12. Performance • Efficient data gathering • data production much more frequent than retrieval • frequency and time of access – difficult to predict • Scalability • grid-scale monitoring system • distributed vs. centralized

  13. Efficient data gathering • Local storing • performance data first stored locally, in the context of application processes • on request, passed to local monitors • saves communication and context switches between application and local monitor processes • Efficient data structures • performance data initially preprocessed • summarized information stored in e.g. counters and integrators

  14. Scalability • Decentralization  multiple service managers instead of one • Possible approaches • fixed number of service managers, each responsible for part of the system • one service manager starting for every monitored application

  15. Fixed number of SMs

  16. One SM per application

  17. Scalability (cont’d) • In the first approach • more tight cooperation between service managers will be necessary • In the second approach • local monitors must have the ability to serve multiple service managers • service managers locate local monitors via directory service

  18. Communication protocol • Based on the OMIS specification • OMIS = On-line Monitoring Interface Specification • specification of a universal interface between tools and a monitoring system • supports various types of tools • allows for easy extending • Necessary Grid-specific extensions (e.g. for authentication)

  19. Comparison to DATAGRID • Monitoring approach • DG: (semi-)on-line • CG: on-line • Architecture • DG: centralized distributed (local monitors and one main monitor) • CG: distributed (local monitors and multiple service managers)

  20. Comparison to DATAGRID (cont’d) • Data collection • DG: local storing with trace buffering or counters • CG: local storing with preprocessing (counters, integrators) • Communication protocol • DG: Not specified • CG: OMIS

  21. Experience • OMIS-based monitoring system for clusters of workstations – OCM • OMIS-based tools – PATOP (performance analysis), DETOP (debugging), others... • Local storing and efficient data structures (counters and integrators) proved to be very efficient • full monitoring overhead of about 4% • Instrumentation techniques used induce zero-overhead when monitoring inactive

  22. Summary • Demand for accurate data from monitoring tools • Monitoring data handling: production / consumption • A general scheme of monitoring compliant with GMA • Need of an advanced monitoring infrastructure • Concepts of OMIS will be extended to fit Grid

More Related