130 likes | 231 Views
P erformance and E xception M onitoring workshop. "PEM" projects. Focus on NGOP (Fermilab) http://www-isd.fnal.gov/ngop/ PEM (CERN) http://proj-pem.cern.ch/proj-pem/ History and timescales Differences and similarities Requirements Architecture Understanding the system
E N D
"PEM" projects • Focus on • NGOP (Fermilab) http://www-isd.fnal.gov/ngop/ • PEM (CERN) http://proj-pem.cern.ch/proj-pem/ • History and timescales • Differences and similarities • Requirements • Architecture • Understanding the system • Areas of common interest • No implementation details yet...
Common history • Autumn 1999/Winter 2000 • User requirements, tools survey • February • Meeting at CHEP • March • Workshop and meeting at Fermilab • Spring • HEPiX Braunschweig • NGOP/PEM meeting at CERN
Project timescales PEM May End of analysis phase June Design/protyping Summer Implementation and deployement on selected clusters Time constraint increasing number of PCs NGOP May-August Design September-October Testing Winter 2001 Integration and release Time constraint Run II
User requirements • Mostly the same • Different focus on operators • NGOP - and OPEP2 - include host management tools • PEM allows only automatic recovery actions, alarm severities not built-in • Scope • Both include only partially the implementation of monitoring agents • PEM does not include user interfaces, although it will provide some
Architecture • NGOP • based on exception monitoring • performance monitoring added separately • PEM • based on performance monitoring • exception monitoring added on top • Both foresee • events/measurements that cannot be generated locally • metric hierarchies
Metric hierarchy • PEM glossary • Service • A predefined set of functionalities provided to users on a set of hosts • NGOP equivalent: cluster • Host • A computing equipment with a network interface • Host type • The property of a set of hosts of having a common goal • Metric (simple or composite) • From GQM • NGOP equivalent: system/subsystem/compenent
PEM sub-systems Monitoring Client Monitoring Server Measurement DB Configuration DB Correlation Server Alarm GUI Notifier Access Server History Display Report Generator
NGOP sub-systems Sensor DBServer Monitoring Agent Central Server Component Name Server Looping Mon Agent Mon Agent Persis. Config Data Alarm GUI Notifier History Display Report Generator
What to measure Goal Question Metric Provide information about machine configuration What is the hardware configuration Number, type and clock of CPU(s) • What quantities have to be monitored? • Use GQM • Goal to be achieved • How to achieve it • What to measure in order to verify • Example
Discussion (1) Areas of collaboration GQM Configuration Correlation language and engine Interface to repositories Agents to collect information
Open points Do we need different glossaries? Do we need multiple architectures? Role of operators Role of system openness Contacts tim.smith@cern.ch alessandro.miotto@cern.ch Discussion (2)