130 likes | 134 Views
SAM Develop. & Integration. David Collados, CERN IT/GD COD-11 - Athens. Outline. Site Availability Metrics SAM Alarms FCR Portal SAM Portal. Outline. Site Availability Metrics SAM Alarms FCR Portal SAM Portal. Site Availability Metrics. How Availability is Calculated:
E N D
SAM Develop. & Integration David Collados, CERN IT/GD COD-11 - Athens
Outline • Site Availability Metrics • SAM Alarms • FCR Portal • SAM Portal SAM Status, COD-11, Athens, 2006-11-08
Outline • Site Availability Metrics • SAM Alarms • FCR Portal • SAM Portal SAM Status, COD-11, Athens, 2006-11-08
Site Availability Metrics • How Availability is Calculated: • Possible Service Status: • 10 - OK • 20 - Down • 30 - Degraded • Per site service status: the OR of indiv. Services (Site BDII, CE, SE) • Per site: the AND of each service status. • Daily & Hourly availability for T1s and T0: • http://lcg-sam.cern.ch:8080/sqldb/site_avail.xsql • TODO: • Display availability at service level. • Similar for any site. SAM Status, COD-11, Athens, 2006-11-08
Outline • Site Availability Metrics • SAM Alarms • FCR Portal • SAM Portal SAM Status, COD-11, Athens, 2006-11-08
SAM Alarms 1/3 • Procedure to trigger an alarm: • the site is not in maintenance, AND • the node belongs to a certified site, AND • the node is not in maintenance, AND • VO is 'OPS', AND • Service is not in ('SE', 'SRM' or 'LFC'), AND • test status is > 40 (ERROR=50 and CRIT=60), AND • the test is critical, AND • there is no alarm already for that test, vo and node. SAM Status, COD-11, Athens, 2006-11-08
SAM Alarms 2/3 • Data stored in each alarm: • alarmid • vo • test • node • test exec time • alarm status (new, assigned, masked, off) • update time • ticket id (GGUS) SAM Status, COD-11, Athens, 2006-11-08
SAM Alarms 3/3 • Automatic Alarms Masking: • If there is one or more alarms with status='new' for this VO, node and test => new alarm triggered as masked. • Rules defining test relationships among alarms: • http://lcg-sam.cern.ch:8080/alarms/mask_alarm.xsql • Expected more work to improve this area. SAM Status, COD-11, Athens, 2006-11-08
Outline • Site Availability Metrics • SAM Alarms • FCR Portal • SAM Portal SAM Status, COD-11, Athens, 2006-11-08
FCR Portal • For VO managers to: • Manipulate top-level BD-IIs. • Set critical tests for all services • Display Site Resources (CE & SE) to be used by the VO (or remove not very stable ones) • The same for central services (RBs, BDIIs, etc) • Changes generate ldif file that BDII takes every 2 mins • Service Availability plots will depend on new set of C.T. • Available at https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi SAM Status, COD-11, Athens, 2006-11-08
Outline • Site Availability Metrics • SAM Alarms • FCR Portal • SAM Portal SAM Status, COD-11, Athens, 2006-11-08
SAM Portal • Judit working on the display of latest results per site and service. Expected around Christmas. • Available at https://lcg-sam.cern.ch:8443/sam/sam.py SAM Status, COD-11, Athens, 2006-11-08
The End Comments? Questions? SAM Status, COD-11, Athens, 2006-11-08