270 likes | 374 Views
LHCb report to LHCC and C-RSG. Philippe Charpentier CERN on behalf of LHCb. Activities in 2009-Q3/Q4. Core Software Stable versions of Gaudi and LCG- AA Applications Stable as of September for real data Fast minor releases to cope with reality of life … Monte-Carlo
E N D
LHCb report toLHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb
Activities in 2009-Q3/Q4 • Core Software • Stable versions of Gaudi and LCG-AA • Applications • Stable as of September for real data • Fast minor releases to cope with reality of life… • Monte-Carlo • Intensive MC09 simulation (@ 5TeV) • Minimum bias • b- and c- inclusive • b signal channels • Few events in foreseen 2009 configuration (450 GeV) • MC09 stripping (2 passes) • Trigger stripping • Physics stripping • Real data reconstruction and stripping • As of November 20th … LHCb to LHCC and C-RSG review, PhC
Resource usage LHCb to LHCC and C-RSG review, PhC
139 sites hit, 4.2 million jobs • Start in June: start of MC09 LHCb to LHCC and C-RSG review, PhC
Job failure: 15% (17% at Tier1s) LHCb to LHCC and C-RSG review, PhC
Failure breakdown LHCb to LHCC and C-RSG review, PhC
Production and user jobs LHCb to LHCC and C-RSG review, PhC
Jobs at Tier1s LHCb to LHCC and C-RSG review, PhC
Job types at Tier1s LHCb to LHCC and C-RSG review, PhC
CPU used (not normalised) • Average job duration • 5.6 hours for all jobs • 20 mn for user jobs (20%) • 6.6 hours for production jobs LHCb to LHCC and C-RSG review, PhC
Average job duration • 5.6 hours for all jobs • 20 mn for user jobs • 6.6 hours for production jobs LHCb to LHCC and C-RSG review, PhC
CPU usage (not normalised) LHCb to LHCC and C-RSG review, PhC
WLCG vs LHCb accounting (unnormalised) • 13% more in WLCG than in DIRAC (unnormalised) • 1.26 Mdaysvs 1.1 Mdays • Overhead of non reporting jobs + pilot/LCG/batch frameworks • Average CPU power: 1.5 kSI2k (from WLCG accounting) LHCb to LHCC and C-RSG review, PhC
Normalised CPU usage in 2009 • Ramping up of pilot role in summer • Resource usage decreased since LHC restarted • Concentrate on (few) real data • Wait for data analysis for continuing MC simulation • Group 1: production • Group 2: pilot • Group 3 & 4: user • Group 5: lcgadmin LHCb to LHCC and C-RSG review, PhC
Resource usage • Note: CERN above does not include non-Grid usage • From WLCG accounting: 32% is non-Grid at CERN • CERN number should then read: 2.18 kHS06.years • CPU usage within 10% of requests • Distribution not exactly like expected • More non-Tier1 resources available • Less MC ran at CERN + Tier1s • Almost no real data: less resources used at CERN • CAF not used as much as expected LHCb to LHCC and C-RSG review, PhC
Storage usage • *) From Castor queries today • **) From WLCG accounting end December • ***) Including 420 TB for T1D0 cache • Sites provided slightly more than the pledges • Thanks! • At CERN, some disk pools (default, T1D0) were not included in the requests but are in the accounting LHCb to LHCC and C-RSG review, PhC
Experience with real data LHCb to LHCC and C-RSG review, PhC
First experience with real data • Very low crossing rate • Maximum 8 bunches colliding (88 kHz crossing) • Very low luminosity • Minimum bias trigger rate: from 0.1 to 10 Hz • Data taken with single beam and with collisions No zero-suppression in VELO Otherwise ~25 GB only! LHCb to LHCC and C-RSG review, PhC
Real data processing • Iterative process • Small changes in reconstruction application • Improved alignment • In total 7 sets of processing conditions • Only last files were all processed4 times now (twice in 2010) • Processing submission • Automatic job creation and submission after: • File is successfully migrated in Castor • File is successfully replicated at Tier1 • If job fails for a reason other than application crash • The file is reset as “to be processed” • New job is created / submitted (automatic) • Processing more efficient at CERN (see later) • Eventually after few trials at Tier1, the file is processed at CERN • No stripping ;-) • DST files distributed to all Tier1s for analysis LHCb to LHCC and C-RSG review, PhC
Reconstruction jobs LHCb to LHCC and C-RSG review, PhC
Issues with real data • Castor migration • Very low rate: had to change the migration algorithm for more frequent migration (1 hour instead of 8 hours) • Issue with large files (above 2 GB) • Real data files are not ROOT files but open by ROOT • There was an issue with a compatibility library for slc4-32 bit on slc5 nodes • Fixed within a day • Wrong magnetic field sign • Due to different coordinate systems for LHCb and LHC ;-) • Fixed within hours • Data access problem (by protocol, directly from server) • Still dCache issue at IN2P3 and NIKHEF • dCache experts working on it • Moved to copy mode paradigm for reconstruction • Still a problem for user jobs: a pain! • Sites are regularly banned for analysis LHCb to LHCC and C-RSG review, PhC
Transfers and job latency • No problem observed during file transfers • Files randomly distributed to Tier1 • Will move to distribution by runs (few 100’s files) • For 2009, runs were never longer than 4-5 files! • Max file size set to 3 GB • Very good Grid latency • Time between submission and jobs starting running LHCb to LHCC and C-RSG review, PhC
Resource requests LHCb to LHCC and C-RSG review, PhC
Resource requests for 2010-12 • 2010 running • The requests were made in April-June 2009 • No additional resources expected • Try to fit within those requests • Running scenario for LHCb • March: 35% LHC efficiency @ 100 Hz • April-May-June: 50% LHC efficiency @ 1 kHz in average • July-August-September-half October: 50% @ 2 kHz • no Heavy Ion run for LHCb • This corresponds to 6.1 106 seconds @ 2 kHz • The 2009-10 request accounted precisely by chance for 6.1 106 seconds (0.5+5.6) • Therefore we use 6.1 106 seconds for 2010 at 2 kHz trigger rate • 2011 running • Use the recommendation of MB • March: 35% LHC efficiency @ 2 kHz • April to mid-October: 50% LHC efficiency @ 2 kHz • Total running time: 8.9 106 seconds • 2012: no run LHCb to LHCC and C-RSG review, PhC
Resource requirements for 2010-12 LHCb to LHCC and C-RSG review, PhC
Comments on resources • Very uncertain and fluctuating running plans! • Depending on LHC running, MC requests may be different • Minimum bias, charm physics, b physics… • Only after one year (at least) experience we can see how running analysis on the Grid works • Analysis at CERN? • Analysis at Tier3s? • Reliability for analysis? • 2012 is still very uncertain • No LHC running • Will the MC requests be the same as previous years • How many reprocessings? • Currently assume 1 full reprocessing of 2010 and 2 of 2011 LHCb to LHCC and C-RSG review, PhC
Conclusions • Real data in 2009 • So few that it didn’t impact resource usage • Was extremely valuable for • Setting procedures • Start understanding the detector • Already very promising performance after a few days • Π0 peak, Λ and K0 reconstruction… • Exercising automatic processes • 2010 • Still expect somewhat chaotic running • Frequent changes in LHC settings, LHCb trigger commissioning • No change in LHCb resource requests w.r.t. June 2009 • 2011 • More precise requests with experience from 2010 • 2012 • Still very preliminary, but small increase only compared to 2011 LHCb to LHCC and C-RSG review, PhC