120 likes | 393 Views
Grid computing at CERN. Oxana Smirnova Lund University/CERN 2 nd NGN meeting, Tallinn, January 20, 2005. CERN: the European Particle Physics Lab. Large Hadron Collider: World’s biggest accelerator at CERN. http://www.cern.ch. Collisions at LHC. Experiments at LHC and computing challenges.
E N D
Grid computing at CERN Oxana Smirnova Lund University/CERN 2nd NGN meeting, Tallinn, January 20, 2005
CERN: the European Particle Physics Lab Large Hadron Collider: World’s biggest accelerator at CERN http://www.cern.ch
Experiments at LHCand computing challenges • Data-intensive tasks • Large datasets, large files • Lengthy processing times • Large memory consumption • High throughput is necessary • Very distributed user base • 50+ countries, thousands of researchers • Distributed computing resources of modest size • Produced and processed data are hence distributed, too • Issues of coordination, synchronization and authorization are outstanding HEP community at CERN were the first to recognize the necessity of Grid computing
Grid projects at and around CERN • MONARC project developed a multi-tiered model for distributed analysis of data • Particle Physics Data Grid (PPDG) and GriPhyN projects by US physicists started using Grid technologies • Used parts of the Globus Toolkit • Globus was picked up by the CERN-lead EU DataGrid (EDG) project • EDG did not satisfy production-level requirements; many simpler solutions appeared (still, Globus-based): • NorduGrid (Northern Europe and others) • Grid3 (USA) • EGEE’s gLite (EU, prototype) • LHC Computing Grid (LCG): builds Grid for CERN, uses modified EDG and aims towards gLite
LHC experiments’ usage of Grid • The experiments recently presented their computing models • All rely on Grid computing in many aspects • Common points: • Multi-tiered hierarchy • Tier0 (CERN) Tier1 (regional) Tier2 (local) • Raw and reconstructed data: 2-3 copies worldwide, analysis objects: a copy per Tier1, some at Tier2 (dedicated) • Grid(s) to be used to manage centralized production at Tier2s and processing at Tier1s, and eventually for analysis • Differences: • 3 out of 4 use different non-LCG Grid-like solutions • ALICE: AliEn (assume it will transform into gLite) • ATLAS: Grid3, ARC • LHCb: Dirac • Only ALICE makes explicit statement on the Grid middleware (needs AliEn) • Some see Grid as a necessity, others – as a possible optimization • Some require a single Grid, others realize there will be many
LCG: the central Grid project for CERN • Sometimes referred to as “the fifth LHC experiment” • Major activities (see http://cern.ch/lcg): • Fabric • Grid deployment and operations • Common applications • Distributed data analysis (“ARDA”) • Originally, have chosen EDG as the basic middleware • Applies some modifications; uses only selected services • Took over EDG middleware support • Later, agreed to share some operational responsibilities and middleware with EGEE • CERN is actually the coordinator of EGEE (see http://cern.ch/egee) • EGEE’s gLite middleware is expected to inherit many EDG solutions • Since late 2003, LCG is in the production mode, to be used by the LHC experiments • 80+ sites, 7000+ processors
LCG status today • LCG Comprehensive Review took place on November 22-23, 2004 • Materials publicly available at http://agenda.cern.ch/fullAgenda.php?ida=a043872 • Excerpts from the final report (from slides by K.Bos): • Middleware: Progress was reported in the development and use of the middleware but the LHCC noted outstanding issues concerning the LCG-2 low job success rate, inadequacies of the workload management and data management systems, as well as delays in the release of the EGEE gLite services. Continued delays in gLite may hinder future progress in ARDA. LCG-2 has been used as a production batch system, but Grid-based analysis of the simulated data is only just starting. The interoperability of the various types of middleware being produced should be pursued together with common interface tools, and developers of the gLite middleware should remain available for the support phase. • Grid Deployment and Regional Centers: Good progress was reported on the installation of Grid software in remote sites. A large amount of data has been processed on the LCG-2 Grid as part of the Data Challenges and the LCG-2 Grid has been operated successfully for several months. However, the LHCC noted that the service provided by LCG-2 was much less than production quality and the experiments and LCG Project expended a large amount of effort to be in a position to use the service.
LCG status, continued • Excerpts from the final report, continued: • Fabric and Network: The LHCC has no major concerns regarding the Fabric Area and Wide Area Networking. In view of the reported delays, the Committee will continue checking on the availability and performance of the CASTOR disk pool management system. • Applications area: The LHCC noted the good progress in the Applications Area with all projects demonstrating significant steps in the development and production of their respective products and services. The major outstanding issues lie with the insufficient coordination between the Applications Area and ROOT and with the imminent reduction of manpower due to the transition from the development to the deployment, maintenance and support phases. • Management and Planning: The LHCC took note of the upcoming milestones for the LCG and noted that discussions are currently underway to secure the missing manpower to develop, deploy and support the Grid services. The lines of responsibility and authority in the overall organization structure need further clarification.
Plans for the gLite middleware in 2005 • End of March • use the gLite middleware (beta) on the extended prototype (eventually the pre-production service) (beta) and provide feedback (technical issues and collect high-level comments and experience from the experiments) • Release Candidate 1 • End of June • use the gLite middleware (version 1.0) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments) • Release 1 • End of September • use the gLite middleware (version 1.1) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments) • Interim Integrated Build • End of December • use the gLite middleware (version 1.2 - release candidate 2) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments) • Release Candidate 2 Slide by F.Hemmer
LCG planning in 2005 • February/March – Fabric & Grid workshop on the computing models • First quarter 2005: • Improve/work out relations between Tier0/1/2 • Understand data access patterns, define experiments’ shares at Tier1s • Prepare documentation for MoUs between LCG and Tier0/1/2 centers • Work on Technical Design Report • Other: • March – detailed plan for the service challenges • March – phase 2 Applications Area plan • April – initial plan for Tier-0/1/2 networking • April – prepare “final” version of the LCG MoU • May – proposal for middleware evolution • End June – Technical Design Report • Detailed plan for installation and commissioning the LHC computing environment • September – final installation and commissioning plan • October – ready to sign the MoU Based on slides by L.Robertson
Summary • CERN expects LCG to provide adequate Grid-like computing infrastructure for the future LHC data processing • The resources are available, and the owners will sign MoUs with CERN/LCG in 2006 • The experiments were testing extensively the LCG system throughout 2004 • No satisfactory production-level service • No optimization yet, needs tremendous efforts to keep it running • Other Grid solutions offered less resources but better reliability with less efforts (see e.g. talks at CHEP04) • Major problems: • Operational and organizational issues • Inadequate middleware without official developers’ support • EGEE is expected to help out: • Manpower for operation, infrastructure and support centers • Improved middleware (gLite) • Still, it becomes clear that there will be no single Grid solution for CERN • EDG/LCG, AliEn, gLite, Dirac, Grid3/OSG, NorduGrid’s ARC, INFN-Grid and counting – all are being used and have avid supporters • Some expect LCG to concentrate on fabric, operations and applications