1 / 25

GridPP3 project status

GridPP3 project status. Sarah Pearce 14 April 2010 GridPP24 RHUL. Since last GridPP meeting. LHC has turned on First collisions in December “First physics” 30 March Grid seems to working as expected for all the experiments so far Tier-1 had some ‘settling in’ problems, mostly resolved

thor
Download Presentation

GridPP3 project status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL

  2. Since last GridPP meeting • LHC has turned on • First collisions in December • “First physics” 30 March • Grid seems to working as expected for all the experiments so far • Tier-1 had some ‘settling in’ problems, mostly resolved • R89 officially opened • 2nd tranche of T2 hardware grants issued (?) • EPSRC review • EGI/ROSCOE/CUE/GridTalk proposals submitted and variously accepted or rejected • GridPP4 proposal written and submitted

  3. Tier-1 • Acceptance problems with one tranche of disk delivery. Now resolved - drives replaced with those of a different manufacturer • Three triggers for disaster management from R89: • Cooling failures in August – remedial work undertaken and work to address Building Management System problems planned • Water leak onto tape robot – audit of water sources carried out and rectification • Impedance problems with the UPS supply – remedial work ongoing • Tier-1 procurements have been carried out. Disk and CPU have been delivered and are undergoing acceptance testing • R89 opened on 30 March – same day as LHC event

  4. EPSRC review • Report explicitly mentions: • GridPP’s expertise in large-scale distributed data management and analysis. • Our work with start-up companies. • The substantial secondary economic benefit arising from the ability to rapidly screen drugs "in-silico”. GridPP resources were used in this way to screen potential agents in the fight against bird-flu and malaria. NGS and GridPP have been highly successful, providing many users with access to more computing power than they could otherwise easily obtain. Looking forward, we recommend that these efforts, including enhanced capacity and function of distributed storage, be sustained and expanded.

  5. EGI-Inspire etc. • Wide range of project bids submitted to the EC in November: • EGI-Inspire. In negotiation, will go ahead with small changes. • UK involved in security, helpdesk, monitoring, regional support (and training) • GridTalk-II. In negotiation, will go ahead. • QMUL and IC UK partners: GridBriefings, GridCasts, GridCafe, RTM… • May change name? • ROSCOE (SSC including HEP) – not successful. • CUE (Training, outreach) – not successful

  6. December performance • Tier-1 • Efficiency for all 4 LHC experiments over 90% • Delivered a record amount of CPU to wLCG • Tier-2s • ScotGrid 98% availability/ 98% reliability • NorthGrid 98%/98% reliability • SouthGrid 97%/98% reliability • LondonGrid 91%/94% reliability • All above 90% threshold

  7. UKI CPU contribution Since GridPP23 2009 up to GridPP23 CPU April 2010 ?

  8. UKI Tier-1 & Tier-2 contribution Since GridPP23 GridPP22-GridPP23

  9. CPU efficiency September 09 – April10

  10. Storage • From gstat (and previous talks…) September 2008 March 2009 September 2009 April 2010

  11. Since data taking started

  12. Project Map GridPP3 Q4 08

  13. ProjectMap Q4 09

  14. Project map - statistics Metrics Milestones

  15. Experiments - red metrics • ATLAS • One red metric, due to problems with T1 disk acceptance • Job success rate and T1 data availability up • LHCb • Red metrics similar to last quarter. Low numbers of jobs account for some. • RAL downtime from database issues reduced T1 LHCb SAM tests uptime • T2 SAM tests low due to issues with EFDA and UCL • CMS • ‘Good quarter’ • Other experiments • Efficiency up to 82% in December, with ALICE data • Fractions used by other experiments down (7.6% of T1 CPU, compared with 14% in Q3) • New users: T2K, Super-B, SuperNemo

  16. Grid services • Operations • 2.1.3 Fraction of KSI2k used (Target 80%, achieved 61%). • 2.1.6 Job success rates – no longer use this metric, as SLL test results unreliable • Security • One red milestone – site security review. Questionnaires sent out in December. Mingchao will report at this meeting • Security incident at Oxford: well handled

  17. Tier-1 • T1 operated well during data taking in Nov/Dec • Work on UPS supply problems continuing. Other issues resolved (cooling, water leak) • One tranche of disk capacity failed acceptance – now solved • MoU commitment for tape not met, due to change in requirements because of LHC schedule • Milestones 3.4.21 General ADS Service Ends. Major users have been migrated. Considering next steps. • Milestone 3.3.31 R89 document available. A document detailing the trasition to R89 was published in March and is available on request.

  18. Tier-2s • % of promised disk and CPU available – green for all Tier-2s (metrics 1&2). • SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4). • Metric 5 (SLL ATLAS test) now suspended • Other red metrics: • Average SLL SE test performance (metric 6) London • CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid • Number of management meetings NorthGrid (metric 11) • Tier-2 meeting LCG MoU service levels (metric 14) LondonGrid –UCL-central: slow to install kernel update - site taken offline

  19. Management and external Project execution – red metrics • Quarterly reports target not met (too much focus on GridPP4!) Rest of Map • No red metrics

  20. Finances - summary

  21. Finances: Tier-1 hardware • Tier-1 disk purchase FY09 increased in order to address potential shortfall: now allows for half a year buffer in disk • £661k moved forward into FY09 from FY10, at the request of STFC • Hardware requirements re-profiled, as a result of changes in LHC schedule • Likely charge for networking costs pa, due to new arrangement between research councils and JISC

  22. Finances: Tier-2s and staff • Second tranche of Tier-2 hardware agreed • 0.5FTE CMS support post at RAL delayed due to recruitment restrictions. Increased to 1FTE for FY10, to compensate for delay. • Bridging funding to retain expertise of some EGEE-funded staff where posts are envisaged to continue in GridPP4.

  23. In parallel

  24. EGEE moves to EGI

  25. And data comes…

More Related