1 / 8

ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014

ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014 . Alarm system A.Caproni. Alarm system status. According to operators the alarm panel is useless Too many alarms Stale alarms False alarms

eagan
Download Presentation

ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALMA Integrated Computing TeamICT Coordination and Planning Meeting #2Santiago 28-29 January 2014 Alarm system A.Caproni

  2. Alarm system status • According to operators the alarm panel is useless • Too many alarms • Stale alarms • False alarms • Result of a 4h profiling by Patricio (mid Nov 2013) • ~31k alarms • ACTIVE 16103 • TERMINATE 15407 • Pri 0: 41 • PRi 1: 1820 • Pri 2: 500 • Pri 3: 29149 • Insufficient coverage: • Scripts and tools not provided by ALMA computing

  3. Snapshot - 1

  4. Snapshot - 2

  5. Snapshot - 3

  6. AS improvement plan (proposal) • Show only “real alarms”, remove the others (trust) • Useful documentation in panel (twiki?) • Fix most chattering alarms • DGCK:*:1, DGCK:*:4 • FLOOG,*,7 • Fix stale alarms • Manager,*,1 • LO2BBpX:*:1, LO2BBpX:*:10, LO2BBpX:*:11 • WCA:*:1 • Improve system startup and device initialization • Profile during operations like array creation/destruction, total power… • TMCDB configuration (input from System Engineering for BACI props)

  7. AS improvement plan (proposal) • ACS next improvements • Alarm server to dump alarms on files (ICT-1908) • Offline profiling • Correlate alarms and logs while debugging (?) • After the facts GUIs and tools • Alarm panel to group alarms belonging to the same array (ICT-1760) • Nominate a “Alarm System Manager” • Regularly profile the AS • Check and update the documentation

  8. Scalability • ACS handed over to OSF after fixing persistence and NCs • RTI/DDS tested with 48 antennas • Number of alarms expected to grow having more antennas • Alarm system performance • AS persists alarms in memory • Already decoupled from source NC • ACS “new” AlarmSource API • avoid resending a alarm if its state did not change • Enable/disable alarm sending • Queuing of alarms

More Related