1 / 19

DIAMON and other CO diagnostics tools

DIAMON and other CO diagnostics tools. DIAMON for monitoring Presentation - Pierre Demo - Joel Discussion Diagnostics for Applications Logging diagnostics and recovering - Marine SIS diagnostics and recovering - Jakub The JAPC diagnostics tool kit - Eric

sibley
Download Presentation

DIAMON and other CO diagnostics tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIAMON and other CO diagnostics tools • DIAMON for monitoring • Presentation - Pierre • Demo - Joel • Discussion • Diagnostics for Applications • Logging diagnostics and recovering - Marine • SIS diagnostics and recovering - Jakub • The JAPC diagnostics tool kit - Eric • Reporting bugs/issues – OP Jira portal - Niall

  2. P.Charrueon behalf of the DIAMON team Introduction to DIAMON

  3. Outline • Some definitions • What is DIAMON • DIAMON Architecture • Team • GUI Demo • DIAMON Roadmap

  4. MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Definitions • Two separate concepts • Monitoring => Continuous snapshot • Diagnostic => Towards recovery of the problem

  5. MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Monitoring => Continuous snapshot • Periodic monitoring of the infrastructure • Runs locally to the system under monitoring • Performs a well defined set of tests • Tests areanalyzed against rules which define “normal behavior” • Summary results pushed to central processing • Monitored items : • Memory, CPU, Operating System Variables • Background processes (servers, tasks, daemons) • FrontEnds, BackEnds • Consoles • FieldBuses • PLCs

  6. MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC Diagnostic => Towards recovery of the problem • Tool for ‘post-mortem’ • Collection of specialists diagnostic utilities • Provide access to statistics of the behavior of the infrastructure • Get more details on a problem • Provide help to solve a problem • Propose actions • Give the responsibleperson’s name

  7. What is DIAMON • DIAGNOSTIC + MONITORING = DIAMON • The DIAMON project has been launched to provide : • Software infrastructure for monitoring the AB Controls Infrastructure • Easy to use first line diagnostics and tool to solve problems or help to decide about responsibilities for first line intervention

  8. DIAMON Scope • The AB Controls Infrastructure spans over huge distances around CERN and covers a multitude of different equipment. • Around 3'000 items in the AB Controls Infrastructure are eligible to be monitored: • Devices connected to Ethernet (PLCs, VME or PC FrontEnds, File and Application servers, Consoles) • High level applications, daemon and servers (e.g. LSA, LASER, SIS, ...) • The main users of the DIAMON project are: • Operators in the CCC (GUI part) • Equipment owners (agent and GUI) • Controls expert (agent and GUI)

  9. DIAMON Architecture

  10. MONITORING Central Logger GUIs Monitoreditems DIAGNOSTIC DELIVERABLES • A standard interface for sending monitoring results • A central repository of these results • A common visualization tool offering many different views of the controls infrastructure (by name, by accelerator, by sub-systems, by building, …) • A standard interface for the diagnostictools

  11. Technical choices • JAVA for the GUI • C++ or JAVA for the agents • LASER infrastructure and services for the core (see Katarina’s talk of last week) • MW lightweight communication package for downwards communications • RBAC to protect the actions DIAMON GUI DIAMON GUI DIAMON GUI DIAMON GUI DIAMON Agents DIAMON Agents DIAMON Agents DIAMON Agents

  12. Responsibilities • Each system is responsible for its own monitoring • Each system is responsible for its own diagnostic tool • The communication, central repository and graphical visualizationtools are under the project responsibility • Idem for the standard interfaces for monitoring and diagnostics

  13. Version available today • Fully functional DIAMON GUI • Agents are deployed on more than 150 hosts (80 FE, 150 consoles and 30 ProLiants) : • CLIC surveying CPU, FileSpace, Process, Memory, time, … • CMW agent centrally surveying all CMW servers • PLC agent centrally pinging a long list of PLC • Timing agent integrated into the clic agent checking timing distribution • The first version of DIAMON proposed here is to replace XCLUC and offer (at least) the same functionalities.

  14. The DIAMON GUI demo

  15. The DIAMON Team • Mark Buttner • Project leader, Configuration, DIAMON daemon, CLIC agent • Pierre Charrue • RBAC integration • Joel Lauener • GUI • MaciejSobczak • Diamoncommunication library • KatarinaSigerudNiallStapley & • LASER integration

  16. What’s next • We wait for your first comments after your experience with the GUI • More agentsdeployed (DSC I/Os, VME monitoring, PVSS, Interlocks, Daemons, …) • More functionalities added in DIAMON • More specific diagnostic tools • Trends analysis • Relationships between monitored information • Tuning of monitored values limits from the GUI • Panic mode study and implementation • How DIAMON can work if network or DB or LASER or diamon-server is not available

  17. Further documentation • DIAMON website : • http://wikis/display/DIAMON/Home • Contact the team : • DIAMON-support@cern.ch • Report problems via JIRA • http://issues.cern.ch/browse/DMN • (see the presentation from Niall)

  18. Thanks for your attention

  19. DIAMON and LASER • LASER is a complete infrastructure offering many services, like the definition, the creation, the transport, the display and the archiving of ALARMS • An ALARM, as explained last Wednesday by Katarina, "informs the operators of an event that requires their attention”. It is an infrastructure for reporting events • The LASER GUI offers tools to get all the information on any ALARM and to acknowledge/terminate/mask/... • DIAMON is a Monitoring and Diagnostic infrastructure based on the services offered by LASER (definition, creation, transport, archiving) • Its main purpose is to MONITOR the controls infrastructure state and report anomalies. A specific GUI is provided to visualise the monitored environment and to offer the possibilities to make pre-defined actions for getting more details or repair the problem • The DIAMON AGENTS are piece of software running periodically to monitor CONTROLS items (system variables, process list, I/O, Timing, Network,...). They also implement pre-defined actions to 'repair' simple errors. • The ultimate goal of DIAMON is to present the operators with a first-line tool to diagnose and solve problems in the Controls Infrastructure. • LASER or DIAMON? • The first version of DIAMON proposed here is to replace XCLUC and offer (at least) the same functionalities. • CCC operators should continue to work as before, LASER GUI to monitor ALARMS and DIAMON (to replace XCLUC) to monitor the controls infrastructure and solve first line problems • In the medium term, discussions will take place inside the Controls group to unify these tools

More Related