420 likes | 548 Views
LHC Computing Grid: CCIN2P3 role and Contribution. KISTI-CCIN2P3 Workshop Ghita Rahal. KISTI, December 1st, 2008. Index. LHC computing grid LCG France LCG at CCIN2P3 Infrastructure Validation: An example with Alice General issues Conclusions.
E N D
LHC Computing Grid: CCIN2P3 role and Contribution KISTI-CCIN2P3 Workshop Ghita Rahal KISTI, December 1st, 2008
Index • LHC computing grid • LCG France • LCG at CCIN2P3 • Infrastructure Validation: An example with Alice • General issues • Conclusions Credits to Fabio Hernandez (CC), LatchezarBetev (Alice)
Worldwide LCG Collaboration • LHC Computing Grid • Purpose: develop, build and maintain a distributed computing environment for the storage and processing of data for the 4 LHC experiments • Ensure the computing service and common (to the 4 experiments) application libraries and tools • Resources contributed by the countries participating in the experiments • Commitments made each October year N for year N+1 • Planning 5-year forward
LHC Data Flow • Raw data generated by the detectors that need to be permanently stored • These figures don't include neither the derived nor the simulated data Accelerator duty cycle: 14 hours/day, 200 days/year 7 PB of additionalraw data per nominal year
Processing power for LHC data • Computing resource requirements • All LHC experiments for 2009 • About 28.000 quad-core Intel Xeon 2.33 GHz (Clovertown) CPUs(14.000 computenodes) • … and 5 MW of electrical power!!! More than 73.000 1TB-disk spins Source: WLCG RevisedComputingCapacityRequirements, Oct. 2007
WLCG Architecture (cont.) • Resource location per tier level Significant fraction of the resourcesdistributed over 130+ centres
Tier-1 centres Source: WLCG Memorandum of Understanding – 2007/12/07
LCG-France project • Goal • Setup, develop and maintain a WLCG Tier-1 and an Analysis Facility at CC-IN2P3 • Promote the creation and coordinate the integration of Tier-2/Tier3 French sites into the WLCG collaboration • Funding • national funding for tier-1 and AF • Tier-2s and tier-3s funded by universities, local/regional governments, hosting laboratories, … • Schedule • Started in June 2004 • 2004-2008: setup and ramp-up phase • 2009 onwards: cruise phase • Equipment budget for Tier-1 and Analysis Facility • 2005-2012: 32 M€
LCG-France • GRIF: tier-2 • APC • CEA/DSM/IRFU • IPNO • LAL • LLR • LPNHE IPHC: tier-3 Strasbourg Ile-de-France Nantes Subatech: tier-2 LAPP: tier-2 Clermont-Ferrand LPC: tier-2 Annecy Lyon IPNL: tier-3 CC-IN2P3: tier-1 & analysisfacility Grenoble LPSC: tier-3 Marseille CPPM: tier-3 Source: http://lcg.in2p3.fr
Associated Tier-2s BELGIUM CMS TIER-2s ROMANIAN ATLAS FEDERATION CC-IN2P3 - LYON IHEP- ATLAS/CMS TIER-2 in BEIJING ICEPP – ATLAS TIER-2 in TOKYO
LCG-France sites • Most sites serve the needs of more than one experiment and group of users
Tier-2s planned contribution LCG-France target
Connectivity • Excellent connectivity to other national and international institutions provided by RENATER • The role of the national academic & research network is instrumental for the effective deployment of the grid infrastructure Kehl Le Mans Angers Tours Genève (CERN) Cadarache Source: Frank Simon, RENATER Darkfiber 2,5 Gbit/s link 1 Gbit/s (GE) link
LCG-France tier-1 & AF Roughly equivalent to 305 Thumpers (with 1TB disks) or 34 racks
CPU ongoing activity at CC 2007 2007 2008 ATLAS CMS NOTE: scale is not the same on all plots Alice LHCb 2007 2008
Resource deployment X 6.7
LCG tier-1: availability & reliability Scheduledshutdown of services on: 18/09/2007 03/11/2007 11/03/2008 Source: WLCG T0 & T1 Site Reliability Reports
LCG tier-1: availability & reliability (cont.) Source: WLCG T0 & T1 Site Reliability Reports
Validation program: Goal • Registration of Data in T0 and on the GRID . • T0T1 replication • Condition Data on the GRID • Quasi online reconstruction • Pass 1 at T0 • Reprocessing at T1 • Replication of ESD : T1T2/CAFs • Quality Control • MC production and user’s analysis at T2/CAFs
Data flow and rates First part: ½ nominal acquisition rate p+p (DAQ) + nominal rate for distribution rfcp Gridftp 60MB/s FTS DAQ CASTOR2 T1 storage xrootd Average:60MB/s Pic: 3GB/s xrootd CAF reco@T0 Source: L. Betev
CCRC08 15 February- 10 March • Tests with half the DAQ-to-CASTOR rates • 82TB total with 90K files (0.9 GB/file) • 70% of the nominal monthly volume p+p
T0 T1 replication End of data taking Expected Rate: 60 MB/s
T0T1 replication ALL CCRC phase 1 CCRC phase 2
T0 CC-IN2P3 End of data taking Tests before Run III (May)
T0 CC-IN2P3 ALL Goal: 160 MB/sec or 14 TB/day Note: the expected rates are stillunknown for someexperiments (and keepchanging). This is the goal according to the Megatable, whichis the reference document (even if itis no longer maintained)
Alice CCRC08 : May period • Detector activities • Alice offline upgrades • New VO-box installation • New AliEn version • Tuning of reconstruction software • Exercise of ‘fast lane’ calib/alignment procedure… • Data Replication • T0->T1 Scheduled according to Alice shares
May: All 4 experiments concurrently • Tier-0 → CCIN2P3 Goal: 160 MB/sec or 14 TB/day Note: the expected rates are stillunknown for someexperiments (and keepchanging). This is the goal according to the Megatable, whichis the reference document (even if itis no longer maintained)
Post Mortem CCRC08 • Reliable central Data Distribution • High CC-IN2P3 efficiency/stability (dCache, FTS,…) • Good and high performance of French Tier 2s • Shown large security margins for transfers between T1 and T2
High priority: Analysis Farm 1/2 • Time to Concentrate on users analysis: • Must take place in parallel with other tasks • Unscheduled burst access to the data • User expects fast return of her/his output • Interactivity…. • At CC ongoing activity: • identify the needs • Setup a common infrastructure for the 4 LHC experiments.
High priority: Analysis Farm 2/2 • At CC ongoing activity cont’d: • Goal: prototype to be tested beginning of 2009. • Alice Specifics: • Farm design already in test at CERN. Expect to deploy one in France according to specs but shareable with other experiments.
General Issues for CC-IN2P3 • Improve each component: • Storage: higher performances for HPSS and improved interactions with dCache, • Increase level of redundancy of the services to decrease human interventions (Voboxes, LFC,…) • Monitoring, Monitoring, Monitoring….. • Manpower: Need to reach higher level of staffing, mainly for storage.
Conclusion • 2008 challenge has shown the Capability of LCG-France to meet the challenges of the computing for LHC • It has also shown the need of permanent background test and monitoring of the worldwide platform • Need to improve the level of reliability of storage and data distribution components.
ALICE Computing Model • p-p: • Quasi-online data distribution and first reco at T0 • Further reconstruction at Tiers-1s • AA • Calibration, alignment and pilot recon during data taking • data distribution and first reco At T0 • One Copy of RAW at T0 and one among Tier-1s
ALICE Computing Model • T0: • First pass reco, storage of 1 Copy of RAW, • Calibration and first pass ESD. • T1 • Storage of % of RAW, ESD’s and AODs on disk • Reconstructions • Scheduled analysis • T2 • Simulation • End User analysis • Copy of ESD and AOD