270 likes | 283 Views
This article provides a historical overview of GridPP, the UK Computing for Particle Physics initiative, including the challenges faced, resource accounting, and performance monitoring. It discusses GridPP's role in the development of an EU Grid infrastructure and its leadership position in particle physics.
E N D
GridPP: UK Computing for Particle Physics Tony Doyle
Outline Context A Brief History Of GridPP UK Computing Centres The Grid & its Challenges Resource Accounting Performance Monitoring Outlook Conclusions The Icemen Cometh R-ECFA Meeting
Context (2000) • To create a UK Particle Physics Grid and the computing technologies required for the Large Hadron Collider (LHC) at CERN • To place the UK in a leadership position in the international development of the development of an EU Grid infrastructure R-ECFA Meeting
1999 Grid: Blueprint for a New Computing Infrastructure by Ian Foster and Carl Kesselman published. February 2000 A Joint Infrastructure Fund bid is submitted for £6.2m to fund a prototype Tier-1 centre at RAL, for the EU-funded DataGrid project. At the time of the JIF bid the LHC was expected to produce 4PB of data a year for 10 years. By 2005, the expected figures had risen to 15PB a year for 15 years. RAL was chosen as the location of the Tier-1 centre because it already hosts the UK BaBar computing centre. May 2000 Last R-ECFA meeting in the UK. October 2000 PPARC signs up to the EU DataGrid project, contributing 20 people and a Tier-1 centre. November 2000 Trade and Industry Secretary Stephen Byers announces £98m for e-Science with Spending Review 2000. This includes £26m for PPARC to develop HEP and astronomy Grid Projects. December 2000 GridPP plan created at a meeting at RAL. Initially the £26m was to help fund UK posts to coordinate the UK arm of LCG, as part of that organisation. April 2001 A Shadow Project Management Board, refered to as "DataGrid-UK", is established. GridPP first proposal submitted. 30/31st May 2001 PPARC's e-Science Committee meets to consider the proposal and approves the GridPP project, allocating £17m. 1st September 2001 GridPP officially starts, with funding for 3 years January 2002 DataGrid releases first production version of the testbed middleware. February 2002 First international file transfers using X.509 digital certificates 1st March 2002 RAL involved in a test of DataGrid by creating a small 5 site testbed Grid, with CERN, IN2P3-Lyon, CNAF-Bologna and NIKHEF 11th March 2002 LHC Computing Grid Project launched. 23th March 2002 First Prototype Tier1/A Hardware delivered to RAL, consisting of 156 dual CPU PCs with 30GB of storage each. 25th April 2002 UK National e-Science Centre (NeSC) opened in Edinburgh by Gordon Brown June 2002 ScotGrid, one of the four Tier-2s in GridPP, goes into production August 2002 GridPP makes its first visit to the All Hands e-Science meeting A Brief History Of GridPP R-ECFA Meeting
December 2002 PPARC receive a further £31.6m for their e-Science programme The UK plays significant role in LHCb Data Challenge February 2003 PPARC put out call for proposals for the second phase of its e-Science programme. June 2003 Proposal for GridPP2 submitted August 2003 UKHEP Certificate Authority is replaced by the UK e-Science Certificate Authority. This issues the digital certificates needed to use the Grid. September 2003 LHC Computing Grid is launched December 2003 GridSite, initially a tool used by the GridPP website gets its first production release GridPP2 proposal accepted by PPARC ensuring project will run until Sept 2007 with £16.9m April 2004 EU DataGrid project ends and is replaced by EGEE (Enabling Grids for E-science in Europe) September 2004 GridPP2 is launched GridPP website wins award at All Hands Meeting for Best e-Science Project Website October 2004 CERN's 50th anniversary January 2005 BaBar UK demonstrates the first successful integration of the Grid into the official BaBar Monte Carlo production system March 2005 LCG passes 100 sites worldwide May 2005 GridPP has grown to 2,740 CPUs and 67 TB of storage July 2005 GridPP members use the UKLight high speed connection between Lancaster and RAL for the first time, moving data 50 times faster than a normal ASDL line. September 2005 First WISDOM biomedical data challenge for drug discovery is run to look for drugs against Malaria GridSiteWiKi software released which allows users with the correct digital certificate to edit wiki pages. New version of Real Time Monitor Launched at e-Science All Hands Meeting. October 2005 International Grid Trust Federation (IGTF) established to regulate the digital certificates used on the Grid worldwide. November 2005 GridPP's storage capacity reaches 100TB 2,000,000 jobs were run on the EGEE Grid in 2005 January 2006 LCG reaches data speeds of 1GB/s during testing of the infrastructure March 2006 The PEGASUS project is announced, a social science study of GridPP by researchers from the London School of Economics PPARC signs the LCG Memorandum of Understanding with CERN, which commits the UK Tier-1 at RAL and the four UK Tier-2s to provide services and resources to the LCG April 2006 EGEE enters 2nd phase PPARC looks for proposals for the continuation of the UK's Grid computing for Particle Physicists after September 2007 May 2006 Second WISDOM biomedical data challenge for drug discovery is run to look for drugs against avian flu July 2006 Proposal for GridPP3 submitted; this would extend the project beyond the current end date of September 2007 August 2006 GridPP has 3,240 CPUs and 246.25TB December 2006 GridPP accounts for 27% of the 2006 total EGEE CPU resources March 2007 PPARC annnounces ₤30m for GridPP extension A Brief History Of GridPP R-ECFA Meeting
Context (2007) • 2006 was the second full year for the UK Production Grid • More than 5,000 CPUs and more than 1/2 Petabyte of disk storage • The UK is the largest CPU provider on the EGEE Grid, with total CPU used of 15 GSI2k-hours in 2006 • The GridPP2 project has met 69% of its original targets with 92% of the metrics within specification • The initial LCG Grid Service is now underway and will run for the first 6 months of 2007 • The aim is to continue to improve reliability and performance ready for startup of the full Grid service on 1st July 2007 • The GridPP2 project has been extended by 7 months to April 2008 • The GridPP3 proposal was recently accepted by PPARC (£30m) to extend the project to March 2011 • We anticipate a challenging period ahead R-ECFA Meeting
Real Time Monitor R-ECFA Meeting
Tier-1 Centre at RAL • High quality data services • National and International Role • UK focus for International Grid development • 1500 CPUs • 750 TB Disk • 530 TB Tape (Capacity 1PB) Grid Operations Centre R-ECFA Meeting
UK Tier-2 Centres ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD London Brunel, Imperial, QMUL, RHUL, UCL Mostly funded by HEFCE (SFC) R-ECFA Meeting
GridPP: Who are we? 19 UK Universities + STFC GridPP1 2001-2004 "From Web to Grid" [£16m+] GridPP2+ 2004-2008 "From Prototype to Production” [£17m+] GridPP3 2008-2011 "From Production to Exploitation” [£30m] R-ECFA Meeting
Workload Management Grid Data Management Network Monitoring Information Services Security Storage Interfaces Middleware GridPP Middleware is.. R-ECFA Meeting
Grid Challenges 2. Software efficiency 1. Software process 3. Deployment planning 4. Link centres 10. Policies 5. Share data Data Management, Security and Sharing 9. Accounting 8. Analyse data 7. Install software 6. Manage data R-ECFA Meeting
Grid Status • Aim: by 2008 (full year’s data taking) • CPU ~100MSI2k (100,000 CPUs) • Storage ~80PB • - Involving >100 institutes worldwide • Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT) • Prototype went live in September 2003 in 12 countries • Extensively tested by the LHC experiments in September 2004 • February 200625,547 CPUs, 4398 TB storage Status in May 2007 (last night): 177 sites, 29,266 CPUs, 13,815 TB storage Monitoring via Grid Operations Centre R-ECFA Meeting
Resources Accumulated EGEE CPU Usage 104,126,019 kSI2k-hoursor >100 GSI2k-hours (!) http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.php UKI: 30,644,586 kSI2k-hours Via APEL accounting R-ECFA Meeting
Job Slots and Use 2004 2005 2006 2007 2004 2005 2006 2007 R-ECFA Meeting
Resource Accounting LHC start-up CPU CPU resources at ~required levels (just in time delivery) Grid Operations Centre Grid-accessible disk accounting being improved 100,000 3GHz CPUs time R-ECFA Meeting
Efficiency (measured by UK Tier-1 for all VOs) target ~90% CPU efficiency due to i/o bottlenecks is OK Concern that this is currently ~75% Each experiment needs to work to improve their system/deployment practice anticipating e.g. hanging gridftp connections during batch work R-ECFA Meeting
UK Resources 2006 CPU Usageby experiment R-ECFA Meeting
site testing SAM tests (critical=subset) BDII Top-level BDII sBDII Site BDII FTS File Transfer Service gCE gLite Computing Element LFC Global LFC VOMS VOMS CE Computing Element SRM SRM gRB gLite Resource Broker MyProxy MyProxy RB Resource Broker VOBOX VO BOX SE Storage Element RGMA RGMA Registry Global Tier-1s UK Tier-1 (RAL) http://gridview.cern.ch/GRIDVIEW/same_index.php R-ECFA Meeting
ATLAS site testing • End-user analysis tests in advance of LHC data-taking • Example: ATLAS • Hourly polling of all sites 12/01/07 10/05/07 http://hepwww.ph.qmul.ac.uk/~lloyd/atlas/atest.php • Measurably improved performance R-ECFA Meeting
CMS Challenge CSA06: Successful CMS global 25% capacity test over a 6 week period in Sep/Oct 2006. • Reconstruction, event selection, calibration, alignment, analysis. • 1PB of data shipped between T0 – T1 – T2s in 6 weeks. • 30 analysis projects involving 70 physicists R-ECFA Meeting
CERN Germany UK Spain France Italy LHCb Production UK consistently largest producer for LHCb R-ECFA Meeting
Forward Look Scenario Planning – Resource Requirements [TB, kSI2k] GridPP requested a fair share of global requirements, according to experiment requirements Changes in the LHC schedule prompted a(nother) round of resource planning - presented to CRRB on Oct 24th New UK resource requirements have been derived and incorporated in the scenario planning e.g. Tier-1 R-ECFA Meeting
Input to Scenario Planning –Hardware Costing Forward Look • Empirical extrapolations with extrapolated (large) uncertainties • Hardware prices have been re-examined following recent Tier-1 purchase • CPU (woodcrest) was cheaper than expected based on extrapolation of previous 4 years of data R-ECFA Meeting
Experiment Application Software Application Middleware Grid Middleware Forward Look Scenario Planning GridPP3 was funded predominately to install and operate resources, and to deploy the wLCG. An example 70% “minimum viable level”scenario [£m] Integration Facilities and Fabrics R-ECFA Meeting
Conclusion • From UK Particle Physics perspective the Grid is thebasis for computing in the 21st Century: • needed to utilise computing resources efficiently and securely • uses gLite middleware (with evolving standards for interoperation) • required significant investment from PPARC (STFC) – O(£100m) over 10 yrs - including support from HEFCE/SFC • required 3 years’ prototype testbed development [GridPP1] • provides a working production system that has been running for over two years in build-up to LHC data-taking [GridPP2] • enables seamless discovery of computing resources: utilised to good effect across the UK – internationally significant • not (yet) as efficient as end-user analysts require: ongoing work to improve performance • ready for LHC – just in time delivery • future operations-led activity as part of LCG, working with EGEE/EGI (EU) and NGS (UK) [GridPP3] • future challenge is to exploit this infrastructure to perform (previously impossible) physics analyses from the LHC (and ILC and nFact and..) R-ECFA Meeting
Further Info http://www.gridpp.ac.uk/ R-ECFA Meeting