80 likes | 218 Views
ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago 28-29 January 2014 . Alarm system A.Caproni. Alarm system status. According to operators the alarm panel is useless Too many alarms Stale alarms False alarms
E N D
ALMA Integrated Computing TeamICT Coordination and Planning Meeting #2Santiago 28-29 January 2014 Alarm system A.Caproni
Alarm system status • According to operators the alarm panel is useless • Too many alarms • Stale alarms • False alarms • Result of a 4h profiling by Patricio (mid Nov 2013) • ~31k alarms • ACTIVE 16103 • TERMINATE 15407 • Pri 0: 41 • PRi 1: 1820 • Pri 2: 500 • Pri 3: 29149 • Insufficient coverage: • Scripts and tools not provided by ALMA computing
AS improvement plan (proposal) • Show only “real alarms”, remove the others (trust) • Useful documentation in panel (twiki?) • Fix most chattering alarms • DGCK:*:1, DGCK:*:4 • FLOOG,*,7 • Fix stale alarms • Manager,*,1 • LO2BBpX:*:1, LO2BBpX:*:10, LO2BBpX:*:11 • WCA:*:1 • Improve system startup and device initialization • Profile during operations like array creation/destruction, total power… • TMCDB configuration (input from System Engineering for BACI props)
AS improvement plan (proposal) • ACS next improvements • Alarm server to dump alarms on files (ICT-1908) • Offline profiling • Correlate alarms and logs while debugging (?) • After the facts GUIs and tools • Alarm panel to group alarms belonging to the same array (ICT-1760) • Nominate a “Alarm System Manager” • Regularly profile the AS • Check and update the documentation
Scalability • ACS handed over to OSF after fixing persistence and NCs • RTI/DDS tested with 48 antennas • Number of alarms expected to grow having more antennas • Alarm system performance • AS persists alarms in memory • Already decoupled from source NC • ACS “new” AlarmSource API • avoid resending a alarm if its state did not change • Enable/disable alarm sending • Queuing of alarms