90 likes | 181 Views
Site Validation Session Report. Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June 19-20th 2006. Service Availability Monitoring (SAM) - “extension” of SFT:.
E N D
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June 19-20th 2006
Service Availability Monitoring (SAM) - “extension” of SFT: • generalized framework to monitor all LCG/EGEE services and not only CE: BDII, RB, LFC, FTS, etc. • most of the sensors run remotely (from central machine) • no installation needed on service machines • moved from MySQL to Oracle, optimized data schema Available at: https://lcg-sam.cern.ch:8443/sam/sam.cgi
SAM sensors: • currently: BDII (Taiwan), RB (RAL), CE, SRM, LFC, FTS, SE (CERN) • release updates + SAM (SFT) • certifying current tests with each new release • Create update tests as necessary • CA cert. releases are special • Availability views • current, daily, weekly, monthly • For CE, SE, SRM, siteBDII • displayed with GridView http://glite.cvs.cern.ch/cgi-bin/glite.cgi/sft2/tests/
OSG Validation services • CE/SE Validation aggregation : VORS - site scanner, BDII info • http://vors.grid.iu.edu/ • OSG VO’s VOMS validation • http://voms-monitor.grid.iu.edu/ • GridEX - application validation ( pilot job submissions ) • http://www.cs.wisc.edu/condor/tools/exerciser/ • Site Policy template and publication • http://vors.grid.iu.edu/site_policies.html • GIP Validation • http://grow.its.uiowa.edu/osg-gip/Production.shtml • Monitoring validation : MonALisa Client status (VO Jobs I/O) • http://grid02.uits.indiana.edu:8080/stats?page=summary • GridCat and the MIS-CI client • http://osg-cat.grid.iu.edu/ - Production instance • Client software: http://software.grid.iu.edu/pacman/tarballs/misci-0.4.1.tar.gz
Summary • It seems to be impossible to avoid cross-monitoring (OSG monitoring doesn't include LCG-specific services, and the other way around) • We should synchronize on VO level, but LCG/EGEE is also using regional structuring
OSG and EGEE Validation Interoperability • Site discovery - using discovered sites using BDII • Ops VO - supported only on OSG sites which are interoperable. (fully deployed in July) • How can we determine if EGEE site is interoperable? Review certain BDII informations • Cross installation of necessary tools and libraries for site validation • LCG tools - added as optionally installed package for OSG sites • OSG environment variables - ? (GIP)
OSG and EGEE Validation Interoperability (cont) • Use of existing GGUS- OSG GOC ticket exchange for error reporting • SAM database to use contact information for OSG GOC • Issue of coordinating scheduled downtime • OSG GOC will maintain a web page with downtimes • Propose review of effort to add OSG specific validations to SAM framework. • Testing and iterative development will be accomplished using Pre-Production sites and OSG ITB
DB monitoring in SAM for Tier 1’s (Dirk Duellmann) • Jobs are connecting to the DB with either http (VO lib) or direct Oracle (instant client) • Should be completed by October when experiments will start using DBs • CMS + Alice don't need them, but only 'squid’ • existing DB monitoring is too detailed for SAM/SFT, but SAM could provide highlevel monitoring of DB service • some DB services (like LFC) are already tested by SAM, BUT only the functionality is tested, not the DB! The test could be: • threshold for connection between T0 -> T1 • user access (squid) • client latency (?) • Oracle client will be installed on the Worker Nodes