1 / 34

Grid Monitoring

Grid Monitoring. 23 Nov 2004, Ferrara . Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 29 Nov 2005. OUTLINE. Grid Monitoring Motivations Requirements for a Grid system Functional and Non-functional Requirements Monitoring Phases:

miron
Download Presentation

Grid Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Monitoring 23 Nov 2004, Ferrara Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 29 Nov 2005

  2. OUTLINE • Grid Monitoring Motivations • Requirements for a Grid system • Functional and Non-functional Requirements • Monitoring Phases: • Data Measurement, distribution, presentation and processing • GridICE • Architecture • Demo Ferrara, 29 Nov 2004

  3. A Use Case Perspective • Grid resources availability is subject to failures • Resources observability is necessary for the Grid utilization • Need for analyzing the usage, behavior and performance of a Grid depending on different users: • VO manager • Grid operations manager • Site administrator Ferrara, 29 Nov 2004

  4. VO Manager Viewpoint • Visualization of the actual set of resources accessible to its members • Evaluation of members’demand satisfaction on the Grid mapping functionalities • Evaluation of the Service Level Agreement (SLA) for the global Grid service offers • Run post mortem analysis Ferrara, 29 Nov 2004

  5. Grid operations manager viewpoint • Detection of fault situations related to wide area distributed resources • Coordination of the deployment and upgrade of the Grid middleware installed at several sites • Investigation on Grid resources for statistical purpose Ferrara, 29 Nov 2004

  6. Site Administrator viewpoint • Detection of fault situations related to the own resources. • Control how the own resources appear to the Grid. Ferrara, 29 Nov 2004

  7. Our Definition • Grid Monitoring • the activity of measuring significant Grid resources related parameters • in order to • analyzeusage, behaviorandperformance of the grid • detect and notify fault situations Ferrara, 29 Nov 2004

  8. Grid MonitoringFunctional Requirements • dynamically partition resources and service usage using three criteria: site ownership, operations domain, and virtual organization accessibility; • collect data in order to enable retrospective analysis; • collect both fine-grained and coarse-grained monitoring data; Ferrara, 29 Nov 2004

  9. Grid MonitoringFunctional vs. Non-Functional Requirements • Functional: What it does • Non-functional: How it does Ferrara, 29 Nov 2004

  10. Grid MonitoringFunctional Requirements • help to detect fault situations and possibly prevent them; • integrate with local monitoring systems, when available; • Track which machines are running the VO applications, the status and behavior of each machine, and the behavior of the software. Ferrara, 29 Nov 2004

  11. Grid Monitoring: Non-functional Requirements • Scalability: monitoring systems have to cope efficiently with a growing number of resources, events and users. • Extensibility: monitoring systems must be extensible with respect to the supported resources. • Flexibility: monitoring systems must integrate different delivery models, depending on the needs (e.g. periodic, on-demand, etc.). • Portability: any encapsulated measurement must be platform independent. • Security: monitoring systems must deal with security concerns such as as privacy, data integratity and confidentiality. Ferrara, 29 Nov 2004

  12. Concepts & Terminology • Entity: any networked and useful resources having a considerable lifetime (e.g. processors, memories, disk capacity, etc.) • Attribute: a characteristic of an entity • Observation: a timestamped measure associated with the attribute of an entity • Sensor: process monitoring an entity and generating observations Ferrara, 29 Nov 2004

  13. Concepts & Terminology • Measurement: the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules. • Measure: Let A be a set of physical or empirical objects. Let B be a set of formal objects, such as numbers. A measure μ is defined to be a one-to-one mapping μ : A B. Ferrara, 29 Nov 2004

  14. Concepts & Terminology • Measurement Method: logical sequence of operations, described generically, used in quantifying an attribute with respect to a specified scale • Measurement Unit: a particular quantity, defined and adopted by convention, with which other quantities of the same kind are compared in order to express their magnitude relative to that quantity THE IMPORTANCE OF MEASUREMENT UNIT http://www.space.com/news/mco_report-b_991110.html Ferrara, 29 Nov 2004

  15. Concepts & Terminology • Measurement Scale: the triple (ERS,FRS, μ) where (informally) • for every relation defined on the physical objects, there is a equivalent relation defined on the measures of those objects (i.e., a statement about a relationship between or among objects is true, then the corresponding relationship between or among their measures is also true) • for every operation defined on the physical objects, there is a corresponding operation defined on the measures, such that the result of measuring the combined objects is the same as performing the corresponding operation on the measures of the individual object Ferrara, 29 Nov 2004

  16. Concepts & Terminology • Relevant Measurement Scale Types: • Nominal scale: classification of measurement values • gives numeric “names” to objects (e.g. OS) • Ordinal scale: order of measurement values • assign numbers to objects in a particular order, but any numbers that maintain that order are equally good (e.g., colors) • Interval scale: interval between measurement values • assign numbers to objects in such a way that the interval between two measure values is meaningful throughout the range of values (e.g., temperature) • Ratio scale:ratio between measurement values • assign values in such a way that the ratio of two measures is meaningful (e.g., bandwidth) • Absolute scale: onlyone possible measurement • have only one way of measuring objects • Each measurement scale presented includes the properties of the preceding measurement scale Ferrara, 29 Nov 2004

  17. Example • Why the Celsius and Fahrenheit measurement scales for the temperature belong to interval scale and not ratio • They are both valid measurement scales for the temperature • A transformation exists to transform a value in a scale to a value in another scale • Ratio scale requires that t1/t2 is meaningful • E.g: • A month ago was 20°C, today is 10°C • Today is twice hot than a month ago • If we were using Fahrenheit, • A month ago was 50°F, today is 84.2°F • Today is not twice hot than a month ago • The ratio between two temperatures is not a valid operation • Consequences: • The sentence “today is twice hot than a month ago” is not admissible • I cannot compute the geometric mean of a set of temperatures as it requires to multiply two or more temperatures Ferrara, 29 Nov 2004

  18. Concepts & Terminology • Relationships among the various types of measurement scales, given the universe of possible measurement scales • The more are the properties of a measurement scale, the less is the number of measurement scales in the category Ferrara, 29 Nov 2004

  19. Concepts & Terminology • The identificaton of measurement scales: • enables to understand what are all possible acceptable measurement scales • enable to know what are the statistics that are supported Ferrara, 29 Nov 2004

  20. The Four Main Phases of Monitoring Processing and abstract the number of received events in order to enable the consumer to draw conclusions about the operation of the monitored system Presenting Transmission of the events from the source to any interested parties (data delivery model: push vs. pull; periodic vs. aperiodic; unicast vs. l-to-N) Distributing Processing Sensors enquiring entities and encoding the measurements according to a schema (active/passive, intrusive/non-intrusive) Generation e.g., filtering according to some predefined criteria, or summarising a group of events Ferrara, 29 Nov 2004

  21. About Sensor Categorization • Active: if it interact directly with the entity which attribute is being monitored • Passive: if it performs the measurement without interacting with the entity (e.g., by reading log files) • Intrusive: if it can sensitively affect the performance of the entity being monitored during the measurement process • Non-intrusive: if the run of the sensor does not affect the target entity Ferrara, 29 Nov 2004

  22. The GridICE approach to Grid Monitoring Ferrara, 29 Nov 2004

  23. GridICE: a bit of history • Grid monitoring tool developed by INFN in the framework of European Grid-related projects • Started in 2003 • Now monitoring several Grids, among theme the EGEE/LCG Grid Ferrara, 29 Nov 2004

  24. Generating Events • Generation of events: • Sensors: typically perl scripts or c programs. • Schema: • GLUE Schema v.1.1 + GridICE extension. • System related (e.g., CPU load, CPU Type, Memory size). • Grid service related (e.g., CE ID, queued jobs). • Network related (e.g., Packet loss). • Job usage (e.g., CPU Time, Wall Time). • All sensors are executed in a periodic fashion. Ferrara, 29 Nov 2004

  25. Distributing Events • Distribution of events: • Hierarchical model. • Intra-site: by means of the local monitoring service • default choice, LEMON (http://www.cern.ch/lemon). • Inter-site: by offering data through the Grid Information Service. • Final Consumer: depending on the client application. • Mixed data delivery model. • Intra-site: depending on the local monitoring service (push for lemon). • Inter-site: depending on the GIS (current choice, MDS 2.x, pull). • Final consumer: pull (browser/application), push (publish/subscribe notification service). Ferrara, 29 Nov 2004

  26. Presenting Monitoring Information • Data stored in a DBMS used to build aggregated statistics. • Data retrieved from the DBMS are encoded in XML files. • XSL to XHTML transformations to publish aggregated data in a Web context. • Pure XML over HTTP to applications • Publish/Subscribe-based information delivery Ferrara, 29 Nov 2004

  27. GridICE Server Charts XSLT->HTML XML Notification S. XML abstraction Persistent storage Discovery Consumers Scheduler Ferrara, 29 Nov 2004

  28. Resource Observability: UML State Diagram • Observability from the viewpoint of the GridICE Server Ferrara, 29 Nov 2004

  29. GridICE Sensor Monitored Resource Site Collector Local Publisher Site Consumer Site Publisher Sensor Site Persistent Storage Ferrara, 29 Nov 2004

  30. Deployment View Ferrara, 29 Nov 2004

  31. Challenges for Data Collection • The distribution of monitoring data is strongly characterised by significant requirements (e.g., Scalability, Heterogeneity, Security, System Health) • None of the existing tools satisfy all of these requirements Ferrara, 29 Nov 2004

  32. Challenges for Data Presentation • Different Grid users are interested in different subset of Grid data and different aggregation levels • Usability principles should be taken into account to help users finding relevant Grid monitoring information • A sintetic data aggregation is crucial to permit a drill-down navigation (from the general to te detailed) of the Grid data Ferrara, 29 Nov 2004

  33. Conclusions • Monitoring of a Grid system is a complex activity as it has to deal with a distributed, dynamic and multi-institutional system offering different abstraction levels • We have seen requirements, characteristics • We have also seen a working solutions Ferrara, 29 Nov 2004

  34. REFERENCES [1] S. Zanikolas, R. Sakellariou, A taxonomy of grid monitoring systems, Future Generation Computer Systems 21 (2005) 163–188 [2] Measurement Theory for Software Engineers http://www2.umassd.edu/SWPI/curriculummodule/em9ps/em9.part3.pdf [3] GridICE: a Monitoring Service for Grid Systems. S. Andreozzi, N. De Bortoli, S. Fantinel, A. Ghiselli, G.L. Rubini, G. Tortone and M.C. Vistoli. In Future Generation Computer Systems Journal, Elsevier, 21(4):559-571, 2005 http://infnforge.cnaf.infn.it/gridice/doc/gridice4FGCS_revised_1.pdf [4] GridICE Website, http://grid.infn.it/gridice [5] GridICE Server for INFN-Grid: http://gridice4.cnaf.infn.it:50080/gridice [6] GridICE Server for LCG-Grid: http://gridice2.cnaf.infn.it:50080/gridice Ferrara, 29 Nov 2004

More Related