1 / 13

Issues from GDB

Issues from GDB. John Gordon, STFC WLCG MB meeting September 28 th 2010. Topics at September GDB. OPN Monitoring APEL CERNVMFS Experiments ’ Operational Issues ( Quarterly ) Others. Monitoring. Missing a central view of LHCOPN HADES data exists (at DFN?) Prototype dashboard

thom
Download Presentation

Issues from GDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues from GDB John Gordon, STFC WLCG MB meeting September 28th 2010

  2. Topics at September GDB • OPN Monitoring • APEL • CERNVMFS • Experiments’ Operational Issues (Quarterly) • Others

  3. Monitoring • Missing a central view of LHCOPN • HADES data exists (at DFN?) • Prototype dashboard • Site status is up when OWD between +/-15% from baseline and packet loss less than 0.1% per five minutes • Site status is down when packet loss = 100% per five minutes • Site status is degraded when measurement values are between a) and b). J. Shade/GDB LHCOPN Update

  4. Prototype Dashboard J. Shade/GDB LHCOPN Update

  5. Prototype Dashboard J. Shade/GDB LHCOPN Update

  6. Monitoring • DANTE baulked at the idea of developing their prototype further and supporting it  • SARA and CERN have picked up the gauntlet. • An historical view was requested and is foreseen. • Questions raised about problem solving procedures. J. Shade/GDB LHCOPN Update

  7. APEL • Update on latest status. • Version using ActiveMQ message passing has been in production since June • New node type glite-apel replaces glite-MON. • Performant and reliable • Sites encouraged to migrate • Anticipate switching off central R-GMA registry at end of 2010. • Requested WLCG input for EGI/EMI development plans

  8. CERNVMFS for Software Servers • The stress on shared software servers has been an issue for experiment and site operations over the summer • PIC and RAL have tested CERNVMFS as a mechanism for distributing experiment software from CERN to worker nodes. • CERNVMFS was developed in OpenLab and has been used to build virtual machine images on demand with experiment software • It uses squid caches to bring software to a site on demand and also caches on WN relieving pressure on site servers. • Removes the need to run jobs to install software at site. Only caches versions used at that site. Removes duplicate files between and within releases. • Initial feedback encouraging. Tests will be scaled up to full site in cooperation with experiments. ATLAS for now but other interested.

  9. Experiment Operations Feedback • Alice were happy  • ATLAS raised the issue of disk server reliability. What they measured were the # incidents where a server was out >24 hours. This is a combination of hardware/software reliability and promptness of the site in restoring the service. Scope for standardising responses across Tier1s. • Concerns about ASGC performance • CMS interested in CernVMFS work for their Tier3s. • Discussion around information publishing (related to L Field proposal on WLCG Information Officer)

  10. Experiment Operations Feedback • LHCb have problems with differing configurations at sites. They believe they can adapt their use if they only have enough information. One suggestion would be a Site Card (cf the VO Card) which specified enough information about the site to enable LHCb to automate optimisation of their use. Discussion in the meeting doubted whether this could be automated and suggested one to one discussion with the site as a better route.

  11. gLite 3.1 Support • Further work on retiring some glite 3.1 services. • Glite developers have proposed the end of life of some services. WLCG asked for comment. • https://twiki.cern.ch/twiki/bin/view/EGEE/LCGprioritiesgLite • EGI Operations will plan with NGIs and their sites taking WLCG views on board. • Potential gap in EMI support filled. Specific sites have agreed to continue middleware support of batch systems required by WLCG. This covers support of CE Information Providers, blahd, and APEL parser.

  12. Misc. • Gstat – • announced new wlcg gstat to be checked by sites. • Gave Ian’s timeline • glexec. • New Condor release over summer should address concerns of ATLAS. ATLAS and CMS asked to runs tests again with latest Condor.

  13. October GDB • Feedback from the DAaMonstrators • What can they show now? • What will they deliver for the end of the year? • Review by panel early in new year. • Security Incident response • glite 3.1 retiral • Installed capacity • glexec testing

More Related