1 / 10

Alarms in CC

Alarms in CC. A brief overview of alarms’ management in the computer centre. Introduction. All alarms of the computer centre are presented to the Operators 24/7 Alarm means that an action is required Can range from simply logging the event to calling out an expert at 2:00 AM.

jaron
Download Presentation

Alarms in CC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alarms in CC A brief overview of alarms’ management in the computer centre

  2. Introduction • All alarms of the computer centre are presented to the Operators 24/7 • Alarm means that an action is required • Can range from simply logging the event to calling out an expert at 2:00 AM

  3. Computers’ alarms workflow Contract type E if not administrated by SysAdmins Contract type D

  4. A typical use case • Log each event (alarm) • Do not analyze a situation • Apply procedures ! • Step by step • Tools must be provided, access granted • Fix and/or escalate problems • Allowed to call people outside working hours LAS OPM SDB

  5. Use case: steps 1 & 2 • OPM (web server) returns a ranked list of matching procedures • Operator selects most appropriate • LAS is a web based GUI

  6. Use case: steps 3 & 4 • SDB (Service DB) lists Services Managers • Use the short URL provided at the bottom of each page to reference them in procedures! • Procedure content • List of nodes (applies to) • One entry per alarm • Commands to type are highlighted • Support links to SDB

  7. Service Managers’ Controls Not covered Do-it-yourself (tuning, corrective actions, etc...) We pay for a number of alarms per month Assistance needed (out of working hours, h/w faults, etc...)

  8. Providing procedures • In General: • Only Service Managers know what to do if anything goes wrong on their service(s) • Simple or urgent actions  Operators • e.g. reboot machine, take it out of production, ... • More complex solutions  SysAdmins • e.g. regenerate certificates, looking in log files, ... • Different Service Managers: • Application SM: service related procedures • Infrastructure SM: machine related procedures

  9. Providing procedures: GUI • http://cern.ch/service-cc-opm/ (demo) • Hints / restrictions: • Quick help for impatient (6 steps) • Start from proposed template (Operators) • Save locally, edit, upload new procedure • Validation! • Further edits can be done on-line (IE vs FF)

  10. Useful links • Service Managers Guidelines • http://cern.ch/it-div-fio-sao/guides/SM_guidelines.htm • Lemon alarm system • http://cern.ch/lemon-status/ • Using the SysAdmin Service • http://cern.ch/service-cc-sysadmin/SM_guidelines.htm

More Related