220 likes | 244 Views
This document outlines the milestones, functionality, and timeline for the operational tools and management meeting in Catania. It provides an overview of the architecture and design phase and the completion of all tools and plans for functionality and delivery.
E N D
Operational Tools Milestones James Casey SA1 Management Meeting Catania
Introduction Architecture and design phase now finished All tools have provided plans with functionality and milestones for delivery • Some deployment milestones too Aim is for a set of milestone deliverables which given a certain complete functionality • 3 month intervals, starting April 2009 If timescales slip, we can stop at any of the milestones and have a complete functional solution • Sacrificing functionality or distribution
M1 Features - April 2009 Regional Dashboard • ‘Regionalized’ dashboards at IN2P3 using current SAM tests for alarms GOCDB • Programmatic Interface (XML over HTTP) available • GOCDB 4 schema deployed with current data inserted for validation Configuration repositories • Aggregate Topology Provider (ATP) • What resources should I test ? • Metric Description Database • What tests should I use ? Gstat • First prototype of new monitoring (based on Nagios) done
M1 Features – April 09 ROC level nagios based monitoring available • Configured from Metric Description Database and ATP • ‘SAM Portal’ level of visualization complete Full Nagios testing of all resources in grid running • At CERN – Central system, simulating 11 ROCs • Used to validate equivalence to SAM • Availability calculation using current algorithm but with new metrics QR Reporting Portal (MSA1.3) • Initial version with metrics for job usage implemented Accounting • Central infrastructure for ActiveMQ based accounting deployed
M1 impact on ROCs Not much ! • Something to look at and play with • No impact on operations Can start integrating GOCDB PI into tools which currently use direct DB connection to GOCDB
M2 Features – July 09 Regional Dashboard • ‘Regionalized’ dashboards interfaced with regional Nagios • Raising alarms based on Nagios Configuration repositories • Metric Description Database • Availability calculation definition implemented GOCDB • Write functions for Programmatic Interface (XML over HTTP) available • XML over HTTP, Web Service and API available • Prototype interface to new GOCDB4 available • ‘region1’ use case deployed. Interfaces to ‘region3’ usecase defined
GOCDB Regional models A reminder of some terms : • Region 1 • Use a distributed GOCDB instance • Customise it to their needs with minimal effort • Region 2 • Keep on using central GOCDB • Region 3 • Use their own model and implementation • Publish required data to a central system
GOCDB details and components Region 3 Region 3 local portal Publisher Region 3 homemade local DB RAL 3rd party tools WS interface Central portal End users Query interface Region 1 Region 1 local portal Data processor data metadata Data collector Region 3 Query interface R3 GOCDB internal queries Region 2 R2 • Region 1 R1 Custom tables Region 1 R1
M2 Features – July 09 ROC level Nagios based monitoring available • Now publishes to new central metric storeSubmission framework fully uses ATP • Central metric store result visualization (SAM Portal/gridview) QR Reporting Portal (MSA1.3) • Added reports for operations and user support use cases • Some still missing (operations.{1,2}, size.2) Accounting • Tested ActiveMQ transport with some selected sites • Patch submitted for gLite certification GGUS • MSG interface for ticket submission/update available SLA Calculation • Multiple simultaneous availability calculations for a VO
M2 Impact on ROC Things are ready to be deployed • GOCDB • At east one ROC will have a regional GOCDB instance communicating with the central instance as a prototype • Regional Nagios for monitoring which can interact with operations dashboard • Accounting now will work with MSG • You can start the migration away from R-GMA This is point at which we could turn off SAM OPS tests • Relying now on Nagios testing, either centrally from CERN, or from the ROC • Smooth migration of testing from central to regional as ROCs come on board
M3 Features – October 09 GOCDB • GOCDB4 in production • At least 2 regional instance working in parallel Gstat • First version of Gstat 2 available QR Reporting Portal (MSA1.3) • Remaining metrics added Accounting • APEL consumer/publisher available for use in regional model Aggregate topology Provider • Aggregate topology provider packaged for easy deployment in regions • For use with other regional tools (e.g. accounting, …)
M3 Impact on ROC More regionalization available • Accounting • Aggregate topology provider • For regional tools which need integrated topology views • GOCDB – more regions could have moved • To either region 1 or region 3 model We can do QR metric report generation automatically from a portal
M4 – December 09 Regional Dashboard • First prototype of pure regional dashboard Availability Calculation • Regional version of availability calculation available Metric Store • Regional version of metric store available to function along with regional availability calculator Accounting • Testing publication via MSG with regions who don’t use APEL Gstat • Gstat 2 in production • Possibility to deploy regional instances
M4 Impact on ROC Where possible, components now available for integration as pure regional tools • Nagios for testing • Dashboard for operational ticketing • Metric store + Availability Calculator • Accounting In my personal opinion, very optimistic • Perhaps take as a “target” instead of a “milestone”
Summary Progressive Milestones • Incremental changes in both functionality and regionalization 2 ‘big’ milestones for ROCs • July • Regional monitoring deployment • December • Regional version of dashboard • Regional version of availability calculation Risks • Regionalisation comes too late in the project • When people are winding down We aim to have a stable ‘escape point’ at each milestone • And can run with the system delivered at any point
Resources https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III Architecture and components https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview Milestone tracking https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringMilestones