210 likes | 297 Views
GridPP23 – Final Steps to Data. David Britton, 8/Sep/09. Since GridPP22 in April…. Validated the UK infrastructure with STEP09. Moved the Tier-1 to R89. Procured new hardware. E xercised our disaster management process (several times!). … and before GridPP24 at RHUL we will have data.
E N D
GridPP23 – Final Steps to Data David Britton, 8/Sep/09
Since GridPP22 in April… • Validated the UK infrastructure with STEP09. • Moved the Tier-1 to R89. • Procured new hardware. • Exercised our disaster management process (several times!) … and before GridPP24 at RHUL we will have data.
WLCG Growth September 2009 March 2009 >315,000 KSI2K
UK CPU Contribution Same picture if non-LHC VOs included
UK Site Contributions 2007 – 8 - 9 NorthGrid: 34 – 22 - 15% London: 28 – 25 - 32% ScotGrid: 18 – 17 - 22% Tier-1: 13 – 15 - 13% SouthGrid: 7 – 16 - 13% GridIreland: 0 – 6 - 5%
Storage Gstat gives: September 2008 March 2009 September 2009 … the last set could actually be sensible!
STEP09 Operations Report at wLCG MB; 16/Jun The lack of “hero-mode” is a direct consequence of all the (heroic) effort that has been put in over the last year to make the UK Grid more resilient.
More STEP09 Highlights • I won’t preempt (too much) the upcoming talks… • RAL was the best ATLAS Tier-1 after the BNL ATLAS-only Tier-1 • Glasgow ran more jobs then any of the 50-60 ATLAS Tier-2 sites throughout the world. • Most Tier-2 sites made good contributions and many gained valuable insight into tuning issues during STEP09 and subsequent testing. • “The responsiveness of RAL to CMS during STEP09 was in stark-contrast to many other Tier-1s.” • CMS noted the tape performance at RAL was very good as was the CPU efficiency. • Many (if not all) the metrics for the experiments were met, and in some cases, significantly exceeded at RAL during STEP09.
(GridPP22) Current Issues: R89 In the end, hand-over was delay from Dec to Apr 09. Hardware was delayed but we were (almost) rescued by the LHC schedule change. Minor (?) issues remain with R89 (Aircon-trips; water-proof membrane?)
Tier-1 Hardware • The FY2008 hardware procurement had to await the acceptance of R89. • The CPU is tested, accepted, and being deployed (14,000 HEPSPEC06) • The disk procurement (2.2 PB) was split into two halves (different disks and controllers to mitigate against acceptance risk). This has proved sensible, as one batch has demonstrated ejection issues. • One half of the disk is being deployed; progress is being made on the other half and best guess is deployment by end of November. • A second SL85000 tape robot is available. • The FY09 hardware procurement is underway.
Disaster Management • A four-stage disaster management process was established at the Tier-1 earlier this year as part of our focus on resilience and disaster management. • Designed to be used regularly so that process is familiar. This means low-threshold to trigger Stage-1 “disasters” • At Stage-3, the process formally involves stake-holders outside the Tier-1, including GridPP management. This has now happened several times including: • R89 aircon trip • R89 water leak • Disk procurement problem • Swine flu planning. • The process is still being honed, but I believe it is very useful.
Tier-2 Performance Resource-weighted averages
Tier-2 Resources 8/Sep/09
EGI/NGI EGI Coordinating body in Amsterdam UK-NGI - NGI National initiatives in member countries - NGI - NGI GridPP Involves STFC, EPSRC and JISC (at least) in the UK. NGS EGI is vital to GridPP but it is not GridPP’s core business to run an e-science infrastructure for the whole of the UK: seek a middle ground.
Jigsaw Puzzle EGI UK involvement via the UK NGI with global tasks such as GOGDB, security, dissemination, training.... Heavy Users SSC EMI SSC SSC (Roscoe) Unicore ARC UK involvement with Ganga? gLite UK involvement with APEL, GridSite? …
Next Steps • Oversight Committee meeting next week. • Approval for OPN resilient link • Confirmation of remaining GridPP3 spending profile • Some guidance on GridPP4? • The LHC start-up, round-2 (Roger Bailey’s talk next!) • Moving towards a UK NGI in the perspective of EGI, SSC’s, EMI, etc. (Monologue by John Cleese: “There will be a certain degree of uncertainty, of that we can be quite (long pause) … sure.”) • Shaking down R89; Settling down for a long run. • Tier-2 hardware allocations. • GridPP4 • … and data!
Summary and the Future • STEP09 validated the UK infrastructure for LHC data-taking and proved that we are in good shape. • We are building on this with careful tuning and further improvements to resilience and management processes. • Great care must be taken not to invalidate the validation (but we cannot sit still either). LHC Data Thoroughly deserved team effort which did not require (much) divine intervention. Oh god! Hand of god?