250 likes | 360 Views
GridPP3 project status. Sarah Pearce 14 April 2010 GridPP24 RHUL. Since last GridPP meeting. LHC has turned on First collisions in December “First physics” 30 March Grid seems to working as expected for all the experiments so far Tier-1 had some ‘settling in’ problems, mostly resolved
E N D
GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL
Since last GridPP meeting • LHC has turned on • First collisions in December • “First physics” 30 March • Grid seems to working as expected for all the experiments so far • Tier-1 had some ‘settling in’ problems, mostly resolved • R89 officially opened • 2nd tranche of T2 hardware grants issued (?) • EPSRC review • EGI/ROSCOE/CUE/GridTalk proposals submitted and variously accepted or rejected • GridPP4 proposal written and submitted
Tier-1 • Acceptance problems with one tranche of disk delivery. Now resolved - drives replaced with those of a different manufacturer • Three triggers for disaster management from R89: • Cooling failures in August – remedial work undertaken and work to address Building Management System problems planned • Water leak onto tape robot – audit of water sources carried out and rectification • Impedance problems with the UPS supply – remedial work ongoing • Tier-1 procurements have been carried out. Disk and CPU have been delivered and are undergoing acceptance testing • R89 opened on 30 March – same day as LHC event
EPSRC review • Report explicitly mentions: • GridPP’s expertise in large-scale distributed data management and analysis. • Our work with start-up companies. • The substantial secondary economic benefit arising from the ability to rapidly screen drugs "in-silico”. GridPP resources were used in this way to screen potential agents in the fight against bird-flu and malaria. NGS and GridPP have been highly successful, providing many users with access to more computing power than they could otherwise easily obtain. Looking forward, we recommend that these efforts, including enhanced capacity and function of distributed storage, be sustained and expanded.
EGI-Inspire etc. • Wide range of project bids submitted to the EC in November: • EGI-Inspire. In negotiation, will go ahead with small changes. • UK involved in security, helpdesk, monitoring, regional support (and training) • GridTalk-II. In negotiation, will go ahead. • QMUL and IC UK partners: GridBriefings, GridCasts, GridCafe, RTM… • May change name? • ROSCOE (SSC including HEP) – not successful. • CUE (Training, outreach) – not successful
December performance • Tier-1 • Efficiency for all 4 LHC experiments over 90% • Delivered a record amount of CPU to wLCG • Tier-2s • ScotGrid 98% availability/ 98% reliability • NorthGrid 98%/98% reliability • SouthGrid 97%/98% reliability • LondonGrid 91%/94% reliability • All above 90% threshold
UKI CPU contribution Since GridPP23 2009 up to GridPP23 CPU April 2010 ?
UKI Tier-1 & Tier-2 contribution Since GridPP23 GridPP22-GridPP23
CPU efficiency September 09 – April10
Storage • From gstat (and previous talks…) September 2008 March 2009 September 2009 April 2010
Project map - statistics Metrics Milestones
Experiments - red metrics • ATLAS • One red metric, due to problems with T1 disk acceptance • Job success rate and T1 data availability up • LHCb • Red metrics similar to last quarter. Low numbers of jobs account for some. • RAL downtime from database issues reduced T1 LHCb SAM tests uptime • T2 SAM tests low due to issues with EFDA and UCL • CMS • ‘Good quarter’ • Other experiments • Efficiency up to 82% in December, with ALICE data • Fractions used by other experiments down (7.6% of T1 CPU, compared with 14% in Q3) • New users: T2K, Super-B, SuperNemo
Grid services • Operations • 2.1.3 Fraction of KSI2k used (Target 80%, achieved 61%). • 2.1.6 Job success rates – no longer use this metric, as SLL test results unreliable • Security • One red milestone – site security review. Questionnaires sent out in December. Mingchao will report at this meeting • Security incident at Oxford: well handled
Tier-1 • T1 operated well during data taking in Nov/Dec • Work on UPS supply problems continuing. Other issues resolved (cooling, water leak) • One tranche of disk capacity failed acceptance – now solved • MoU commitment for tape not met, due to change in requirements because of LHC schedule • Milestones 3.4.21 General ADS Service Ends. Major users have been migrated. Considering next steps. • Milestone 3.3.31 R89 document available. A document detailing the trasition to R89 was published in March and is available on request.
Tier-2s • % of promised disk and CPU available – green for all Tier-2s (metrics 1&2). • SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4). • Metric 5 (SLL ATLAS test) now suspended • Other red metrics: • Average SLL SE test performance (metric 6) London • CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid • Number of management meetings NorthGrid (metric 11) • Tier-2 meeting LCG MoU service levels (metric 14) LondonGrid –UCL-central: slow to install kernel update - site taken offline
Management and external Project execution – red metrics • Quarterly reports target not met (too much focus on GridPP4!) Rest of Map • No red metrics
Finances: Tier-1 hardware • Tier-1 disk purchase FY09 increased in order to address potential shortfall: now allows for half a year buffer in disk • £661k moved forward into FY09 from FY10, at the request of STFC • Hardware requirements re-profiled, as a result of changes in LHC schedule • Likely charge for networking costs pa, due to new arrangement between research councils and JISC
Finances: Tier-2s and staff • Second tranche of Tier-2 hardware agreed • 0.5FTE CMS support post at RAL delayed due to recruitment restrictions. Increased to 1FTE for FY10, to compensate for delay. • Bridging funding to retain expertise of some EGEE-funded staff where posts are envisaged to continue in GridPP4.