380 likes | 579 Views
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008. Challenges and Opportunities. Grid Computing in HIGH ENERGY Physics. The scales. High Energy Physics machines and detectors. pp @ √s=14 TeV L : 10 34 /cm 2 /s. L: 2.10 32 /cm 2 /s.
E N D
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13th May 2008 Challenges and Opportunities Grid Computing in HIGH ENERGY Physics
The scales Ian.Bird@cern.ch
High Energy Physicsmachines and detectors pp @ √s=14 TeV L : 1034/cm2/s L: 2.1032 /cm2/s Chambres à muons Calorimètre - 2,5 million collisions per second LVL1: 10 KHz, LVL3: 50-100 Hz 25 MB/sec digitized recording 40 million collisions per second LVL1: 1 kHz, LVL3: 100 Hz 0.1 to 1 GB/sec digitizedrecording Ian.Bird@cern.ch
LHC: 4 experiments … ready! First physics expected in autumn 2008 Is the computing ready ? Ian.Bird@cern.ch
The LHC Computing Challenge • Signal/Noise: 10-9 • Data volume • High rate * large number of channels * 4 experiments • 15 PetaBytes of new data each year • Compute power • Event complexity * Nb. events * thousands users • 100 k of (today's) fastest CPUs • Worldwide analysis & funding • Computing funding locally in major regions & countries • Efficient analysis everywhere • GRID technology Ian.Bird@cern.ch
A collision at LHC Luminosity :1034cm-2 s-1 40 MHz – every 25 ns 20 events overlaying Ian.Bird@cern.ch
The Data Acquisition Ian.Bird@cern.ch
Tier 0 at CERN: Acquisition, First pass reconstruction,Storage& Distribution 1.25 GB/sec (ions) Ian.Bird@cern.ch
Tier 0 – Tier 1 – Tier 2 Tier-0 (CERN): • Data recording • First-pass reconstruction • Data distribution Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis Tier-2 (>200 centres): • Simulation • End-user analysis Ian.Bird@cern.ch
Evolution of requirements ATLAS (or CMS) requirementsfor first year at design luminosity LHC approved 7x107 MIPS1,900 TB disk (140 MSi2K) 55x107 MIPS70,000 TB disk 107 MIPS100 TB disk ATLAS&CMSCTP “Hoffmann”Review ComputingTDRs LHCb approved ATLAS & CMS approved ALICEapproved LHC start Ian.Bird@cern.ch
Evolution of CPU Capacity at CERN Tape & disk requirements:>10 times CERNpossibility Costs (2007 Swiss Francs) Includes infrastructurecosts (comp.centre,power, cooling, ..) and physics tapes ppbar (540GeV) SC (0.6GeV) LEP II (200GeV) ISR (300GeV) PS (28GeV) SPS (400GeV) LHC (14 TeV) LEP (100GeV)
Evolution of Grids OSG GriPhyN, iVDGL, PPDG GRID 3 WLCG EU DataGrid EGEE 1 EGEE 2 EGEE 3 LCG 1 LCG 2 Cosmics Service Challenges First physics Data Challenges Ian.Bird@cern.ch
The Worldwide LHC Computing Grid • Purpose • Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments • Ensure the computing service • … and common application libraries and tools • Phase I – 2002-05 - Development & planning • Phase II – 2006-2008 – Deployment & commissioning of the initial services Ian.Bird@cern.ch
WLCG Collaboration Tier 1 – all have now signed Tier 2: MoU Signing Status • The Collaboration • 4 LHC experiments • ~250 computing centres • 12 large centres (Tier-0, Tier-1) • 56 federations of smaller “Tier-2” centres • Growing to ~40 countries • Grids: EGEE, OSG, Nordugrid • Technical Design Reports • WLCG, 4 Experiments: June 2005 • Memorandum of Understanding • Agreed in October 2005 • Resources • 5-year forward look Australia Belgium Canada * China Czech Rep. * Denmark Estonia Finland France Germany (*) Hungary * Italy India Israel Japan JINR Korea Netherlands Norway * Pakistan Poland Potugal Romania Russia Slovenia Spain Sweden * Switzerland Taipei Turkey * UK Ukraine USA Still to sign: Austria Brazil (under discussion) * Recent additions Ian.Bird@cern.ch
WLCG Service Hierarchy Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-0 – the accelerator centre • Data acquisition & initial processing • Long-term data curation • Distribution of data Tier-1 centres Tier-1 – “online” to the data acquisition process high availability • Managed Mass Storage – grid-enabled data service • Data-heavy analysis • National, regional support Tier-2: ~130 centres in ~35 countries • End-user (physicist, research group) analysis – where the discoveries are made • Simulation Ian.Bird@cern.ch
Recent grid use • Across all grid infrastructures • EGEE, OSG, Nordugrid • The grid concept really works – all contributions – large & small are essential! CERN: 11% Tier 2: 54% Tier 1: 35%
Recent grid activity • WLCG ran ~ 44 M jobs in 2007 – workload has continued to increase • 29M in 2008 – now at ~ >300k jobs/day • Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system • Tier 2 contribution is around 50%; > 85% is external to CERN 300k /day • These workloads (reported across all WLCG centres) are at the level anticipated for 2008 data taking 230k /day
LHCOPN Architecture Ian.Bird@cern.ch
Data Transfer out of Tier-0 Target: 2008/2009 1.3 GB/s Ian.Bird@cern.ch
Production Grids • WLCG relies on a production quality infrastructure • Requires standards of: • Availability/reliability • Performance • Manageability • Will be used 365 days a year ... (has been for several years!) • Tier 1s must store the data for at least the lifetime of the LHC - ~20 years • Not passive – requires active migration to newer media • Vital that we build a fault-tolerant and reliable system • That can deal with individual sites being down and recover Ian.Bird@cern.ch
The EGEE Production Infrastructure Test-beds & Services Operations Coordination Centre Production Service Pre-production service Regional Operations Centres Certification test-beds (SA3) Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Security & Policy Groups Joint Security Policy Group EuGridPMA (& IGTF) Grid Security Vulnerability Group Operations Advisory Group (+NA4) Support Structures & Processes Training activities (NA3) Training infrastructure (NA4)
Site Reliability Ian.Bird@cern.ch
Improving Reliability • Monitoring • Metrics • Workshops • Data challenges • Experience • Systematic problem analysis • Priority from software developers
Middleware: Baseline Services The Basic Baseline Services – from the TDR (2005) • Storage Element • Castor, dCache, DPM • Storm added in 2007 • SRM 2.2 – deployed in production – Dec 2007 • Basic transfer tools – Gridftp, .. • File Transfer Service (FTS) • LCG File Catalog (LFC) • LCG data mgt tools - lcg-utils • Posix I/O – • Grid File Access Library (GFAL) • Synchronised databases T0T1s • 3D project Focus now on continuing evolution of reliability, performance, functionality, requirements For a production grid the middleware must allow us to build fault-tolerant and scalable services: this is more important than sophisticated functionality • Information System • Scalability improvements • Compute Elements • Globus/Condor-C– improvements to LCG-CE for scale/reliability • web services (CREAM) • Support for multi-user pilot jobs (glexec, SCAS) • gLite Workload Management • in production • VO Management System (VOMS) • VO Boxes • Application software installation • Job Monitoring Tools
Database replication • In full production • Several GB/day user data can be sustained to all Tier 1s • ~100 DB nodes at CERN and several 10’s of nodes at Tier 1 sites • Very large distributed database deployment • Used for several applications • Experiment calibration data; replicating (central, read-only) file catalogues
LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid Interoperability & interoperation is vital significant effort in building the procedures to support it
Grid infrastructure project co-funded by the European Commission - now in 2nd phase with 91 partners in 32 countries 240 sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • …
EGEE: Increasing workloads ⅓non-LHC
Grid Applications Seismology Medical Chemistry Astronomy Particle Physics Fusion Ian.Bird@cern.ch
Share of EGEE resources HEP 5/07 – 4/08: 45 Million jobs Ian.Bird@cern.ch
HEP use of EGEE: May 07 – Apr 08 Ian.Bird@cern.ch
Sustainability: Beyond EGEE-II • Need to prepare permanent, common Grid infrastructure • Ensure the long-term sustainability of the European e-infrastructure independent of short project funding cycles • Coordinate the integration and interaction between National Grid Infrastructures (NGIs) • Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs
EGI – European Grid Initiative www.eu-egi.org EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3
Summary • We have an operating production quality grid infrastructure that: • Is in continuous use by all 4 experiments (and many other applications); • Is still growing in size – sites, resources (and still to finish ramp up for LHC start-up); • Demonstrates interoperability (and interoperation!) between 3 different grid infrastructures (EGEE, OSG, Nordugrid); • Is becoming more and more reliable; • Is ready for LHC start up • For the future we must: • Learn how to reduce the effort required for operation; • Tackle upcoming issues of infrastructure (e.g. Power, cooling); • Manage migration of underlying infrastructures to longer term models; • Be ready to adapt the WLCG service to new ways of doing distributed computing Ian.Bird@cern.ch