1 / 32

Overview & Status Al-Ain, UAE November 2007

Overview & Status Al-Ain, UAE November 2007. Outline. Introduction – The computing challenge -why grid computing? Overview of the LCG Project Project Status Challenges & Outlook. The LHC Computing Challenge. Signal/Noise 10 -9 Data volume

yeo-harper
Download Presentation

Overview & Status Al-Ain, UAE November 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview & Status Al-Ain, UAE November 2007

  2. Outline • Introduction – • The computing challenge -why grid computing? • Overview of the LCG Project • Project Status • Challenges & Outlook Ian.Bird@cern.ch

  3. The LHC Computing Challenge • Signal/Noise 10-9 • Data volume • High rate * large number of channels * 4 experiments • 15 PetaBytes of new data each year • Compute power • Event complexity * Nb. events * thousands users •  100 k of (today's) fastest CPUs • Worldwide analysis & funding • Computing funding locally in major regions & countries • Efficient analysis everywhere •  GRID technology Ian.Bird@cern.ch

  4. Timeline: LHC Computing ATLAS (or CMS) requirementsfor first year at design luminosity LHC approved 7x107 MIPS1,900 TB disk (140 MSi2K) 55x107 MIPS70,000 TB disk 107 MIPS100 TB disk ATLAS&CMSCTP “Hoffmann”Review ComputingTDRs LHCb approved ATLAS & CMS approved ALICEapproved LHC start Ian.Bird@cern.ch

  5. Evolution of CPU Capacity at CERN Costs (2007 Swiss Francs) Includes infrastructurecosts (comp.centre,power, cooling, ..) and physics tapes ppbar (540GeV) SC (0.6GeV) LEP II (200GeV) ISR (300GeV) PS (28GeV) SPS (400GeV) LHC (14 TeV) LEP (100GeV) Ian.Bird@cern.ch

  6. Requirements Match CPU & disk requirements:>10 times CERNpossibility Ian.Bird@cern.ch

  7. LHC Computing  Multi-science Grid CERN • 1999 - MONARC project • First LHC computing architecture – hierarchical distributed model • 2000 – growing interest in grid technology • HEP community main driver in launching the DataGrid project • 2001-2004 - EU DataGrid project • middleware & testbed for an operational grid • 2002-2005 – LHC Computing Grid – LCG • deploying the results of DataGrid to provide a production facility for LHC experiments Ian.Bird@cern.ch

  8. The Worldwide LHC Computing Grid • Purpose • Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments • Ensure the computing service • … and common application libraries and tools • Phase I – 2002-05 - Development & planning • Phase II – 2006-2008 – Deployment & commissioning of the initial services Ian.Bird@cern.ch

  9. WLCG Collaboration • The Collaboration • 4 LHC experiments • ~250 computing centres • 12 large centres (Tier-0, Tier-1) • 38 federations of smaller “Tier-2” centres • Growing to ~40 countries • Grids: EGEE, OSG, Nordugrid • Technical Design Reports • WLCG, 4 Experiments: June 2005 • Memorandum of Understanding • Agreed in October 2005 • Resources • 5-year forward look

  10. LCG Service Hierarchy Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-0 – the accelerator centre • Data acquisition & initial processing • Long-term data curation • Distribution of data  Tier-1 centres Tier-1 – “online” to the data acquisition process  high availability • Managed Mass Storage – grid-enabled data service • Data-heavy analysis • National, regional support Tier-2: ~130 centres in ~35 countries • End-user (physicist, research group) analysis – where the discoveries are made • Simulation Ian.Bird@cern.ch

  11. about 100,000 CPU cores New data will grow at about 15 PetaBytes per year – with two copies Significant fraction of the resources distributed over more than 120 computing centres Distribution of Computing Services CPU Disk Tape Ian.Bird@cern.ch

  12. Grid Activity • Continuing increase in usage of the EGEE and OSG grids • All sites reporting accounting data (CERN, Tier-1, -2, -3) • Increase in past 17 months – 5 X number of jobs - 3.5 X cpu usage 100K jobs/day

  13. October 2007 - CPU UsageCERN, Tier-1s, Tier-2s • > 85% of CPU Usage is external to CERN * * NDGF usage for September 2007 Ian.Bird@cern.ch

  14. Tier-2 Sites – October 2007 • 30 sites deliver 75% of the cpu • 30 sites deliver 1% Ian.Bird@cern.ch

  15. LHCOPN Architecture Ian.Bird@cern.ch

  16. Data Transfer out of Tier-0 Ian.Bird@cern.ch

  17. Middleware: Baseline Services The Basic Baseline Services – from the TDR (2005) • Storage Element • Castor, dCache, DPM (with SRM 1.1) • Storm added in 2007 • SRM 2.2 – long delays incurred- being deployed in production • Basic transfer tools – Gridftp, .. • File Transfer Service (FTS) • LCG File Catalog (LFC) • LCG data mgt tools - lcg-utils • Posix I/O – • Grid File Access Library (GFAL) • Synchronised databases T0T1s • 3D project ... continuing evolution reliability, performance, functionality, requirements • Information System • Compute Elements • Globus/Condor-C • web services (CREAM) • gLite Workload Management • in production at CERN • VO Management System (VOMS) • VO Boxes • Application software installation • Job Monitoring Tools Ian.Bird@cern.ch

  18. Site Reliability – CERN + Tier-1s “Site Reliability” a function of • grid services • middleware • site operations • storage management systems • networks • ........

  19. Tier-2 Site Reliability Site Reliability Tier-2 Sites 83 Tier-2 sites being monitored

  20. Improving Reliability • Monitoring • Metrics • Workshops • Data challenges • Experience • Systematic problem analysis • Priority from software developers

  21. LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid Ian.Bird@cern.ch

  22. LHC Computing  Multi-science Grid CERN • 1999 - MONARC project • First LHC computing architecture – hierarchical distributed model • 2000 – growing interest in grid technology • HEP community main driver in launching the DataGrid project • 2001-2004 - EU DataGrid project • middleware & testbed for an operational grid • 2002-2005 – LHC Computing Grid – LCG • deploying the results of DataGrid to provide a production facility for LHC experiments • 2004-2006; 2006-2008 – EU EGEE project • starts from the LCG grid • shared production infrastructure • expanding to other communities and sciences • Now preparing 3rd phase Ian.Bird@cern.ch

  23. Grid infrastructure project co-funded by the European Commission - now in 2nd phase with 91 partners in 32 countries 240 sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • …

  24. EGEE infrastructure use > 90k jobs/day LCG >143 k jobs/day total Data from EGEE accounting system LHCC Comprehensive Review; 19-20 November 2007

  25. EGEE working with related infrastructure projects

  26. Sustainability: Beyond EGEE-II Need to prepare permanent, common Grid infrastructure Ensure the long-term sustainability of the European e-infrastructure independent of short project funding cycles Coordinate the integration and interaction between National Grid Infrastructures (NGIs) Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs

  27. EGI – European Grid Initiative www.eu-egi.org EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3

  28. Challenges • Short timescale • Preparation for start-up: • Resource ramp-up across Tier 1 and 2 sites • Site and service reliability • Longer term • Infrastructure – power and cooling • Multi-core CPU – how will we make best use of them? • Supporting large scale analysis activities – just starting now – what will be the new problems that arise? • Migration from today’s grid to a model of national infrastructures – how to ensure that LHC gets what it needs Ian.Bird@cern.ch

  29. Combined Computing Readiness Challenge - CCRC • A combined challenge by all Experiments & Sites • validate the readiness of the WLCG computing infrastructure • before start of data taking • at a scale comparable to that need for data taking in 2008 • Should be done well in advance of the start of data taking • to identify flaws, bottlenecks • and allow time to fix them • Wide battery of tests – simultaneously – all experiments • Driven from DAQ with full Tier-0 processing • Site-site data transfers, storage system to storage system • Required functionality and performance • Data access patterns similar to 2008 processing • CPU and data loads simulated as required to reach 2008 scale • Coordination team in place • Two test periods – February, May

  30. Ramp-up Needed for Startup 2.3 X 2.9 X 3.7 X 3 X Sep Jul Apr -06 -07 -08 Sep Jul Apr -06 -07 -08 3.7 X target usage usage pledge installed Sep Jul Apr -06 -07 -08 Jul Sep Apr -07 -07 -08 Sep Jul Apr -06 -07 -08 Ian.Bird@cern.ch

  31. Summary • We have an operational grid service for LHC • EGEE – The European Grid Infrastructure - is the world’s largest multi-disciplinary grid for science • ~240 sites; > 100 application groups • Over the next months before LHC comes on-line: • Ramp-up resources to the MoU levels • Improve service reliability and availability • Full program of “dress-rehearsals” to demonstrate the complete computing system

  32. The Grid is now in operation, working on: reliability, scaling up, sustainability Tier-1 Centers: TRIUMF (Canada); GridKA(Germany); IN2P3 (France); CNAF (Italy); SARA/NIKHEF (NL); Nordic Data Grid Facility (NDGF); ASCC (Taipei); RAL (UK); BNL (US); FNAL (US); PIC (Spain) Ian.Bird@cern.ch

More Related