190 likes | 385 Views
DIAMON and other CO diagnostics tools. DIAMON for monitoring Presentation - Pierre Demo - Joel Discussion Diagnostics for Applications Logging diagnostics and recovering - Marine SIS diagnostics and recovering - Jakub The JAPC diagnostics tool kit - Eric
E N D
DIAMON and other CO diagnostics tools • DIAMON for monitoring • Presentation - Pierre • Demo - Joel • Discussion • Diagnostics for Applications • Logging diagnostics and recovering - Marine • SIS diagnostics and recovering - Jakub • The JAPC diagnostics tool kit - Eric • Reporting bugs/issues – OP Jira portal - Niall
P.Charrueon behalf of the DIAMON team Introduction to DIAMON
Outline • Some definitions • What is DIAMON • DIAMON Architecture • Team • GUI Demo • DIAMON Roadmap
MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Definitions • Two separate concepts • Monitoring => Continuous snapshot • Diagnostic => Towards recovery of the problem
MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Monitoring => Continuous snapshot • Periodic monitoring of the infrastructure • Runs locally to the system under monitoring • Performs a well defined set of tests • Tests areanalyzed against rules which define “normal behavior” • Summary results pushed to central processing • Monitored items : • Memory, CPU, Operating System Variables • Background processes (servers, tasks, daemons) • FrontEnds, BackEnds • Consoles • FieldBuses • PLCs
MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Diagnostic => Towards recovery of the problem • Tool for ‘post-mortem’ • Collection of specialists diagnostic utilities • Provide access to statistics of the behavior of the infrastructure • Get more details on a problem • Provide help to solve a problem • Propose actions • Give the responsibleperson’s name
What is DIAMON • DIAGNOSTIC + MONITORING = DIAMON • The DIAMON project has been launched to provide : • Software infrastructure for monitoring the AB Controls Infrastructure • Easy to use first line diagnostics and tool to solve problems or help to decide about responsibilities for first line intervention
DIAMON Scope • The AB Controls Infrastructure spans over huge distances around CERN and covers a multitude of different equipment. • Around 3'000 items in the AB Controls Infrastructure are eligible to be monitored: • Devices connected to Ethernet (PLCs, VME or PC FrontEnds, File and Application servers, Consoles) • High level applications, daemon and servers (e.g. LSA, LASER, SIS, ...) • The main users of the DIAMON project are: • Operators in the CCC (GUI part) • Equipment owners (agent and GUI) • Controls expert (agent and GUI)
MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC DELIVERABLES • A standard interface for sending monitoring results • A central repository of these results • A common visualization tool offering many different views of the controls infrastructure (by name, by accelerator, by sub-systems, by building, …) • A standard interface for the diagnostictools
Technical choices • JAVA for the GUI • C++ or JAVA for the agents • LASER infrastructure and services for the core (see Katarina’s talk of last week) • MW lightweight communication package for downwards communications • RBAC to protect the actions DIAMON GUI DIAMON GUI DIAMON GUI DIAMON GUI DIAMON Agents DIAMON Agents DIAMON Agents DIAMON Agents
Responsibilities • Each system is responsible for its own monitoring • Each system is responsible for its own diagnostic tool • The communication, central repository and graphical visualizationtools are under the project responsibility • Idem for the standard interfaces for monitoring and diagnostics
Version available today • Fully functional DIAMON GUI • Agents are deployed on more than 150 hosts (80 FE, 150 consoles and 30 ProLiants) : • CLIC surveying CPU, FileSpace, Process, Memory, time, … • CMW agent centrally surveying all CMW servers • PLC agent centrally pinging a long list of PLC • Timing agent integrated into the clic agent checking timing distribution • The first version of DIAMON proposed here is to replace XCLUC and offer (at least) the same functionalities.
The DIAMON Team • Mark Buttner • Project leader, Configuration, DIAMON daemon, CLIC agent • Pierre Charrue • RBAC integration • Joel Lauener • GUI • MaciejSobczak • Diamoncommunication library • KatarinaSigerudNiallStapley & • LASER integration
What’s next • We wait for your first comments after your experience with the GUI • More agentsdeployed (DSC I/Os, VME monitoring, PVSS, Interlocks, Daemons, …) • More functionalities added in DIAMON • More specific diagnostic tools • Trends analysis • Relationships between monitored information • Tuning of monitored values limits from the GUI • Panic mode study and implementation • How DIAMON can work if network or DB or LASER or diamon-server is not available
Further documentation • DIAMON website : • http://wikis/display/DIAMON/Home • Contact the team : • DIAMON-support@cern.ch • Report problems via JIRA • http://issues.cern.ch/browse/DMN • (see the presentation from Niall)
DIAMON and LASER • LASER is a complete infrastructure offering many services, like the definition, the creation, the transport, the display and the archiving of ALARMS • An ALARM, as explained last Wednesday by Katarina, "informs the operators of an event that requires their attention”. It is an infrastructure for reporting events • The LASER GUI offers tools to get all the information on any ALARM and to acknowledge/terminate/mask/... • DIAMON is a Monitoring and Diagnostic infrastructure based on the services offered by LASER (definition, creation, transport, archiving) • Its main purpose is to MONITOR the controls infrastructure state and report anomalies. A specific GUI is provided to visualise the monitored environment and to offer the possibilities to make pre-defined actions for getting more details or repair the problem • The DIAMON AGENTS are piece of software running periodically to monitor CONTROLS items (system variables, process list, I/O, Timing, Network,...). They also implement pre-defined actions to 'repair' simple errors. • The ultimate goal of DIAMON is to present the operators with a first-line tool to diagnose and solve problems in the Controls Infrastructure. • LASER or DIAMON? • The first version of DIAMON proposed here is to replace XCLUC and offer (at least) the same functionalities. • CCC operators should continue to work as before, LASER GUI to monitor ALARMS and DIAMON (to replace XCLUC) to monitor the controls infrastructure and solve first line problems • In the medium term, discussions will take place inside the Controls group to unify these tools