100 likes | 183 Views
Derived Metrics Prototyping. Dennis.Waldron@cern.ch IT-FIO-FS. Objective. Initially: How to calculate derived metrics in a global context?. Expanded: Evaluation of Heidelberg Fault Tolerance package in the context of a global framework of recovery.
E N D
Derived Metrics Prototyping Dennis.Waldron@cern.ch IT-FIO-FS
Objective • Initially: • How to calculate derived metrics in a global context?. • Expanded: • Evaluation of Heidelberg Fault Tolerance package in the context of a global framework of recovery. • Note: Derived metrics also referred to as: • Global correlations • Combined metrics
Motivation • Produce a: • Simple • Extensible • Highly Configurable • Powerful • A prototype sensor exists ‘CMsensor’ • Coded in PERL • Adheres to WP4 architecture • Can be used in both local and global context • Limitations apply to global context
Architecture User System Boundary 1…* Measurement Repository Database Cache Subscription Sensor Sensor MR Server (OraMonServer) Collector Agent (MSA) Sensor MR API Node Configuration File(s) MRs sensorAPI CMsensor XML XML XML XML Interpreter
CMsensor • Responsible for: • Configuration Management • Autoload of configuration changes on the fly, instantiating, removal, alteration • Subscription Management • Manages active subscriptions, re-subscription requests, shutdowns, reconnects, etc… • Metric Triggers • Calculation of derived metrics can be triggered by: • MSA scheduling (GET request) • Subscription callback • Data Caching • Rule Evaluation and Error Handling
CMsensor cont. • 1 XML file exists for each derived metric - defines: • Metric name and description • Subscription requirements • Metric processing code (rule) Example 1: <sensorMetric name=“example1” description=“this is an example metric”> <metric>10002</metric> <rule> # PERL code here $value = &getMetric($host, $metric) * 50; # return value to MR &storeSample(03, $mid, 0, $host, $value); </rule> </sensorMetric> Note: $host, $metric, $mid (re-injection id) locally scoped at execution
CMsensor cont. Example 2: <sensorMetric name=“daemonUp” description=“Does at least 1 instance of a daemon exist across a defined list of nodes”> <rule> # PERL code here my $retval = 0; my $node; my @node_list = split(/ /, $params); foreach $node (sort @node_list) { if (&getMetric($node, 12345) == 1) { $retval = 1; last; } } &storeSample(01, $mid, 0, $retval); </rule> </sensorMetric>
CMsensor cont. • A sensorAPI PERL module implementing the latest ASCII MSA – MS protocol (v1.3) now exists. • Sensor utilises the new PERL simplified API for MR access. • No hard coded metrics! • All fatal messages are trapped, logged and appropriate action triggered. • System allows for access to alternative sources of information other the central measurement repository e.g. LSF.
Limitations • Large volume re-injection causes performance issues with the MSA. • The impact of large volume insertions with the MR is unknown. • Derived metrics requiring a lot of processing can cause a backlog of metrics to be processed, hence some schedule executions may be skipped. • The extraction of large volumes of data takes a long time. • Can retrieve approx 266 metrics/second through the simplified API in comparison to 4668 through direct SQL calls. • Current MR queries limited to 12,000 values • EDG Bugzilla 2380, 2381 • Segmentation Faults in subscription mechanism via PERL MRs • EDG Bugzilla 2320, 2366 • CMsensor lacks some functionality