340 likes | 525 Views
Grid Monitoring. 23 Nov 2004, Ferrara . Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 29 Nov 2005. OUTLINE. Grid Monitoring Motivations Requirements for a Grid system Functional and Non-functional Requirements Monitoring Phases:
E N D
Grid Monitoring 23 Nov 2004, Ferrara Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 29 Nov 2005
OUTLINE • Grid Monitoring Motivations • Requirements for a Grid system • Functional and Non-functional Requirements • Monitoring Phases: • Data Measurement, distribution, presentation and processing • GridICE • Architecture • Demo Ferrara, 29 Nov 2004
A Use Case Perspective • Grid resources availability is subject to failures • Resources observability is necessary for the Grid utilization • Need for analyzing the usage, behavior and performance of a Grid depending on different users: • VO manager • Grid operations manager • Site administrator Ferrara, 29 Nov 2004
VO Manager Viewpoint • Visualization of the actual set of resources accessible to its members • Evaluation of members’demand satisfaction on the Grid mapping functionalities • Evaluation of the Service Level Agreement (SLA) for the global Grid service offers • Run post mortem analysis Ferrara, 29 Nov 2004
Grid operations manager viewpoint • Detection of fault situations related to wide area distributed resources • Coordination of the deployment and upgrade of the Grid middleware installed at several sites • Investigation on Grid resources for statistical purpose Ferrara, 29 Nov 2004
Site Administrator viewpoint • Detection of fault situations related to the own resources. • Control how the own resources appear to the Grid. Ferrara, 29 Nov 2004
Our Definition • Grid Monitoring • the activity of measuring significant Grid resources related parameters • in order to • analyzeusage, behaviorandperformance of the grid • detect and notify fault situations Ferrara, 29 Nov 2004
Grid MonitoringFunctional Requirements • dynamically partition resources and service usage using three criteria: site ownership, operations domain, and virtual organization accessibility; • collect data in order to enable retrospective analysis; • collect both fine-grained and coarse-grained monitoring data; Ferrara, 29 Nov 2004
Grid MonitoringFunctional vs. Non-Functional Requirements • Functional: What it does • Non-functional: How it does Ferrara, 29 Nov 2004
Grid MonitoringFunctional Requirements • help to detect fault situations and possibly prevent them; • integrate with local monitoring systems, when available; • Track which machines are running the VO applications, the status and behavior of each machine, and the behavior of the software. Ferrara, 29 Nov 2004
Grid Monitoring: Non-functional Requirements • Scalability: monitoring systems have to cope efficiently with a growing number of resources, events and users. • Extensibility: monitoring systems must be extensible with respect to the supported resources. • Flexibility: monitoring systems must integrate different delivery models, depending on the needs (e.g. periodic, on-demand, etc.). • Portability: any encapsulated measurement must be platform independent. • Security: monitoring systems must deal with security concerns such as as privacy, data integratity and confidentiality. Ferrara, 29 Nov 2004
Concepts & Terminology • Entity: any networked and useful resources having a considerable lifetime (e.g. processors, memories, disk capacity, etc.) • Attribute: a characteristic of an entity • Observation: a timestamped measure associated with the attribute of an entity • Sensor: process monitoring an entity and generating observations Ferrara, 29 Nov 2004
Concepts & Terminology • Measurement: the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules. • Measure: Let A be a set of physical or empirical objects. Let B be a set of formal objects, such as numbers. A measure μ is defined to be a one-to-one mapping μ : A B. Ferrara, 29 Nov 2004
Concepts & Terminology • Measurement Method: logical sequence of operations, described generically, used in quantifying an attribute with respect to a specified scale • Measurement Unit: a particular quantity, defined and adopted by convention, with which other quantities of the same kind are compared in order to express their magnitude relative to that quantity THE IMPORTANCE OF MEASUREMENT UNIT http://www.space.com/news/mco_report-b_991110.html Ferrara, 29 Nov 2004
Concepts & Terminology • Measurement Scale: the triple (ERS,FRS, μ) where (informally) • for every relation defined on the physical objects, there is a equivalent relation defined on the measures of those objects (i.e., a statement about a relationship between or among objects is true, then the corresponding relationship between or among their measures is also true) • for every operation defined on the physical objects, there is a corresponding operation defined on the measures, such that the result of measuring the combined objects is the same as performing the corresponding operation on the measures of the individual object Ferrara, 29 Nov 2004
Concepts & Terminology • Relevant Measurement Scale Types: • Nominal scale: classification of measurement values • gives numeric “names” to objects (e.g. OS) • Ordinal scale: order of measurement values • assign numbers to objects in a particular order, but any numbers that maintain that order are equally good (e.g., colors) • Interval scale: interval between measurement values • assign numbers to objects in such a way that the interval between two measure values is meaningful throughout the range of values (e.g., temperature) • Ratio scale:ratio between measurement values • assign values in such a way that the ratio of two measures is meaningful (e.g., bandwidth) • Absolute scale: onlyone possible measurement • have only one way of measuring objects • Each measurement scale presented includes the properties of the preceding measurement scale Ferrara, 29 Nov 2004
Example • Why the Celsius and Fahrenheit measurement scales for the temperature belong to interval scale and not ratio • They are both valid measurement scales for the temperature • A transformation exists to transform a value in a scale to a value in another scale • Ratio scale requires that t1/t2 is meaningful • E.g: • A month ago was 20°C, today is 10°C • Today is twice hot than a month ago • If we were using Fahrenheit, • A month ago was 50°F, today is 84.2°F • Today is not twice hot than a month ago • The ratio between two temperatures is not a valid operation • Consequences: • The sentence “today is twice hot than a month ago” is not admissible • I cannot compute the geometric mean of a set of temperatures as it requires to multiply two or more temperatures Ferrara, 29 Nov 2004
Concepts & Terminology • Relationships among the various types of measurement scales, given the universe of possible measurement scales • The more are the properties of a measurement scale, the less is the number of measurement scales in the category Ferrara, 29 Nov 2004
Concepts & Terminology • The identificaton of measurement scales: • enables to understand what are all possible acceptable measurement scales • enable to know what are the statistics that are supported Ferrara, 29 Nov 2004
The Four Main Phases of Monitoring Processing and abstract the number of received events in order to enable the consumer to draw conclusions about the operation of the monitored system Presenting Transmission of the events from the source to any interested parties (data delivery model: push vs. pull; periodic vs. aperiodic; unicast vs. l-to-N) Distributing Processing Sensors enquiring entities and encoding the measurements according to a schema (active/passive, intrusive/non-intrusive) Generation e.g., filtering according to some predefined criteria, or summarising a group of events Ferrara, 29 Nov 2004
About Sensor Categorization • Active: if it interact directly with the entity which attribute is being monitored • Passive: if it performs the measurement without interacting with the entity (e.g., by reading log files) • Intrusive: if it can sensitively affect the performance of the entity being monitored during the measurement process • Non-intrusive: if the run of the sensor does not affect the target entity Ferrara, 29 Nov 2004
The GridICE approach to Grid Monitoring Ferrara, 29 Nov 2004
GridICE: a bit of history • Grid monitoring tool developed by INFN in the framework of European Grid-related projects • Started in 2003 • Now monitoring several Grids, among theme the EGEE/LCG Grid Ferrara, 29 Nov 2004
Generating Events • Generation of events: • Sensors: typically perl scripts or c programs. • Schema: • GLUE Schema v.1.1 + GridICE extension. • System related (e.g., CPU load, CPU Type, Memory size). • Grid service related (e.g., CE ID, queued jobs). • Network related (e.g., Packet loss). • Job usage (e.g., CPU Time, Wall Time). • All sensors are executed in a periodic fashion. Ferrara, 29 Nov 2004
Distributing Events • Distribution of events: • Hierarchical model. • Intra-site: by means of the local monitoring service • default choice, LEMON (http://www.cern.ch/lemon). • Inter-site: by offering data through the Grid Information Service. • Final Consumer: depending on the client application. • Mixed data delivery model. • Intra-site: depending on the local monitoring service (push for lemon). • Inter-site: depending on the GIS (current choice, MDS 2.x, pull). • Final consumer: pull (browser/application), push (publish/subscribe notification service). Ferrara, 29 Nov 2004
Presenting Monitoring Information • Data stored in a DBMS used to build aggregated statistics. • Data retrieved from the DBMS are encoded in XML files. • XSL to XHTML transformations to publish aggregated data in a Web context. • Pure XML over HTTP to applications • Publish/Subscribe-based information delivery Ferrara, 29 Nov 2004
GridICE Server Charts XSLT->HTML XML Notification S. XML abstraction Persistent storage Discovery Consumers Scheduler Ferrara, 29 Nov 2004
Resource Observability: UML State Diagram • Observability from the viewpoint of the GridICE Server Ferrara, 29 Nov 2004
GridICE Sensor Monitored Resource Site Collector Local Publisher Site Consumer Site Publisher Sensor Site Persistent Storage Ferrara, 29 Nov 2004
Deployment View Ferrara, 29 Nov 2004
Challenges for Data Collection • The distribution of monitoring data is strongly characterised by significant requirements (e.g., Scalability, Heterogeneity, Security, System Health) • None of the existing tools satisfy all of these requirements Ferrara, 29 Nov 2004
Challenges for Data Presentation • Different Grid users are interested in different subset of Grid data and different aggregation levels • Usability principles should be taken into account to help users finding relevant Grid monitoring information • A sintetic data aggregation is crucial to permit a drill-down navigation (from the general to te detailed) of the Grid data Ferrara, 29 Nov 2004
Conclusions • Monitoring of a Grid system is a complex activity as it has to deal with a distributed, dynamic and multi-institutional system offering different abstraction levels • We have seen requirements, characteristics • We have also seen a working solutions Ferrara, 29 Nov 2004
REFERENCES [1] S. Zanikolas, R. Sakellariou, A taxonomy of grid monitoring systems, Future Generation Computer Systems 21 (2005) 163–188 [2] Measurement Theory for Software Engineers http://www2.umassd.edu/SWPI/curriculummodule/em9ps/em9.part3.pdf [3] GridICE: a Monitoring Service for Grid Systems. S. Andreozzi, N. De Bortoli, S. Fantinel, A. Ghiselli, G.L. Rubini, G. Tortone and M.C. Vistoli. In Future Generation Computer Systems Journal, Elsevier, 21(4):559-571, 2005 http://infnforge.cnaf.infn.it/gridice/doc/gridice4FGCS_revised_1.pdf [4] GridICE Website, http://grid.infn.it/gridice [5] GridICE Server for INFN-Grid: http://gridice4.cnaf.infn.it:50080/gridice [6] GridICE Server for LCG-Grid: http://gridice2.cnaf.infn.it:50080/gridice Ferrara, 29 Nov 2004