260 likes | 402 Views
Matthias Kasemann CERN/DESY. The CMS Computing System: getting ready for Data Analysis. CMS achievements 2006. Magnet & Cosmics Test (August 06) Detector Lowering (January 07). CMS achievements 2006 : Physics TDRs.
E N D
Matthias Kasemann CERN/DESY The CMS Computing System:getting ready for Data Analysis
CMS achievements 2006 Magnet & Cosmics Test (August 06) Detector Lowering (January 07) ISGC 2007: CMS Computing
CMS achievements 2006: Physics TDRs • Feb 2006: Volume I of the P-TDR; describes detector performance and software. • Jun 2006: Volume II describes the physics performance. • The two volumes constitute the culmination of our plans for data analysis in CMS with up to 30 fb-1 of data. • The special study of detector commissioning and data analysis during the startup of CMS, has been deferred to 2007. • This activity mobilized hundreds of collaborators during the past two years, and many useful lessons have been learned. ISGC 2007: CMS Computing
CMS: Computing highlights 2006 • Main computing/software milestones: • Magnet Test Cosmic challenge (Apr 06) • Computing Software and Analysis Challenge 06 (Nov 06) • 2006: a year of fundamental software changes • New simulation and reconstruction software packages released • Very positive feedback from users • Developed procedures for release integration, building and distribution. • Control release tools, Hypernews, Nightly builds, Tag collector, WorkBook,… • Design control of all interfaces and data formats in place • CMSSW framework, framework-light, ROOT available for data access • Integration with CMS detector and commissioning activities • Strong connections with various detector groups – key for commissioning • Validation software packages and validation procedure in place – crucial for startup preparation ISGC 2007: CMS Computing
Major Milestone in 2006: CSA06 • Combined Computing, Software, and Analysis challenge (CSA06) • A “25% of 2008” data challenge of the CMS data handling model, computing operations • Integrated test of full end-to-end chain of the complete system, from (simulated) raw data to analysis at Tier-1 and Tier-2 centers. • Launched on Oct 2, 2006; many months of preparation and following the development of about 0.5M lines of software in the new CMSSW framework. • 6 weeks later having achieved all technical goals of the challenge. Code ran with negligible crash rate, without any memory problems on all samples • By the end of CSA06: Tier-0 centre reconstructed > 200M events; >1 Petabyte of data shipped across network between Tier-0, Tier-1, and Tier-2 centers. • Excellent collaboration with IT department was an important factor in the success of the challenge • World-wide distributed system of regional Tier1 and Tier2 centers ISGC 2007: CMS Computing
CSA06: T0 Goals & Achievements • Prompt Reconstruction at 40 Hz • 50 Hz for 2 weeks, then 100 Hz • Peak rate: >300 Hz for >10 hours • 207M events total • Uptime: 80% of best 2 weeks • Achieved 100% of 4 weeks • Use of Frontier for DB access to prompt reconstruction conditions • The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software • Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA • CPU use • Max CPU efficiency: 96% of 1400 CPUs over ~12 hours • Explored realistic T0 operations, upgrading and intervening on a running system ISGC 2007: CMS Computing
CSA06: T0T1 Transfers Last week’s averages hit350MB/s (daily) 650MB/s (hourly)i.e. exceeded 2008 levels for ~10 days (with some backlog observed) • Goal was to sustain 150 MB/s to T1s • Twice the expected 40 Hz output rate Monthly T1 Transfer plot signals start Target rate Min bias only @ start T0 rate: 54 110 170 160 Hz ISGC 2007: CMS Computing
CSA06: Individual T0 - T1 Performance Goals Achievements • 6 of 7 Tier-1s exceed 90% availability for 30 days • U.S. T1 (FNAL) hit 2X goal • 5 sites stored data to MSS (tape) ISGC 2007: CMS Computing
CSA06: Jobs Execution on the Grid • > 50K jobs/day submitted on all but one day in final week • > 30K/day robot jobs • 90% job completion efficiency • Robot jobs have same mechanics as user job submissions via CRAB • Mostly T2 centers as expected • OSG carries large proportion • Scaling issues encountered, but subsequently solved ISGC 2007: CMS Computing
CSA06: Prompt Tracker Alignment • Determine new alignment: • Run “HIP” algorithm on multiple CPUs at CERN over dedicated alignment skim from T0 • 1 Million events ~4h on 20CPU • Write new alignment into offline • DB at T0 (ORCOFF) • distribute offline DB to T1/T2’s TIB DS modules - positions results 2 days after AlCaReco! Closing the loop: analysis of re-reconstructed Z m+m- data at T1/T2 site: Three scenarios: Ideal/misaligned/realigned (grid jobs at T1-PIC) ISGC 2007: CMS Computing
1 GLB + 1 tracker track 2 GLB tracks 1 GLB + 1 STA track CSA07: Physics Analysis Demonstrations • These demonstrations proved to be useful training exercises for collaborators in the new software and computing tools. • Muon: • Extraction of W • Di-Muon reconstruction efficiency • Z, J/+- • Northwestern and Purdue groups and T2 activity • Tau: • Selection of Ztau tau l+jet • Tau mis-id study from Z+jet • Tau tagging efficiency ISGC 2007: CMS Computing
CSA06 Summary • All goals were met • T0 prompt reconstruction of RECO, AOD, AlCaReco, and with Frontier access @100% efficiency for 207M events • Export to T1 @ 150 MB/s and higher • Data reduction (skim) production at T1s performed, transferred to T2s • Re-reconstruction demonstrated at 6 T1 centers • Job load exceeded 50K/day • Alignment/Calibration/Physics analyses widely demonstrated • CSA06 was a huge enterprise • Commissioned the CMS data-handling workflow @ 25% scale • Everything worked down to the final analysis plots • Many lessons can be drawn for the future as we prepare for data-handling operations, and more things to commission • DAQ Storage Manager T0 • Support of global data-taking during detector commissioning ISGC 2007: CMS Computing
Some Lessons from CSA06 • CMS needs some development work to ease the operations load • Strong engagement with OSG, WLCG and sites was extremely useful • Grid service and site problems were addressed promptly. • FTS at CERN was carefully monitored, response when needed • CASTOR support at CERN was excellent • Support from CERN IT was key for success and very instrumental • Data management needs an automatic way to ensure consistency across all components • Scale testing continues to be an extremely important activity ISGC 2007: CMS Computing
CMS Outlook and Perspectives for 2007 • Lower all the detector, and commission it underground. • Prepare final distributed computing and software system and physics analysis capability. • Initial* CMS detector will be ready for collisions at 900 GeV at the end of 2007. • Low luminosity detector will be ready for collisions at design energy in mid-2008. • Initial* CMS detector is the low luminosity detector minus ECAL endcaps and pixels. Install both during 07/08 winter shutdown. ISGC 2007: CMS Computing
CMS computing goals in 2007 • Demonstrate Physics Analysis performance using final software with high statistics. • Major MC production of up to 200M events started last week • Analysis starts in June, finishes by September • Regular data taking: Detector – HLT – TAPE - T0 - T1 • At regular intervals, 3-4 days per months, starting May • Month of October: MTCC3 Readout of (successively more) components, data will be processed and distributed to T1 ISGC 2007: CMS Computing
Computing Commissioning Plans 2007 Start large MC Production • February • Deploy PhEDEx 2.5 • T0-T1, T1-T1, T1-T2 independent transfers • Restart job robot • Start work on SAM • FTS full deployment • March • SRM v2.2 tests start • T0-T1(tape)-T2 coupled transfers (same data) • Measure data serving at sites (esp. T1) • Production/analysis share at sites verified • April • Repeat transfer tests with SRM v2.2, FTS v2 • Scale up job load • gLite WMS test completed (synch. with Atlas) • May • Start ramping up to CSA07 • July • CSA07 Event Filter tests Start Analysis Start Global data-taking runs preCSA07 CSA07 GlobalDetector Run LHC Eng. run ISGC 2007: CMS Computing
Motivations for CSA07 There are two important goals for 2007, the last year of preparations for physics and analysis 1) Scaling We need to reach 100% of system scale and functionality by spring of 2008 • CSA06 demonstrated between 25% and 50% depending on the metric 2) We need to transition to sustainable operations This spans all areas of computing • Data management • Job processing • User Support • Site configuration and consistency In the past functionality was valued higher than the operations load • As we prepare for long term support this emphasis needs to change ISGC 2007: CMS Computing
CSA07 Goals: Increase Scale CMS demonstrated 25% performance in 2006. We have two more factors of 2 to ramp up before data taking in 2008 • The data transfer between Tier-0 and Tier-1 reached about 50% of scale • Very successful test, but some signs of system stress were visible • Job submission rate reached 25%. We plan another formal challenge in 2007 • A > 50% challenge in the summer of 2007 • Extend the system to include the HLT farm • Add elements like simulation production • Increase user load • Run concurrent with other experiments stressing the system ISGC 2007: CMS Computing
CMS Computing Model & Resources CMS Tier-1 centers: ISGC 2007: CMS Computing
CSA07 Workflow ISGC 2007: CMS Computing
CSA07 success metrics ISGC 2007: CMS Computing
CSA07 Goals for Tier-1s In the Computing Model the Tier-1 centers perform 4 functions: • Archive Data, both real and simulation from Tier-2 centers • Execute skimming and selection for users and groups on the data • Re-reconstruction of raw data • Serving data samples to Tier-2 centers for further analysis As we transition to operations we should bring the Tier-1 centers into alignment with their core functionality ISGC 2007: CMS Computing
CSA07: expectations of Tier-2s MC Production at Tier-2s • were a significant contributor to the 25M events/month for CSA06 • When the experiment is running the Tier-2s are the only dedicated simulation resources and the expectations is 100M per month • Now CMS produces 30M events/months, goal for CSA07 is 50M Analysis submission • The Tier-2s are expected to support communities • Either local groups or regions of interest • Only implemented in a couple of specific communities • Unlike Tier-1 data subscriptions and processing expectations, which are largely specified by the experiment centrally, the Tier-2s have control over the data and the activity CMS will work to improve the reliability and availability of the Tier-2 centers ISGC 2007: CMS Computing
Tier-2 Analysis goals in 2007 Tier-2s are the primary analysis resource controlled by physicists • The activities are intended to be controlled by user communities Up to now most of the analysis has been hosted at the Tier-1 sites CMS will enlarge analysis support by hosting important physics samples exclusively at Tier-2 centers • We have roughly 10-15 sites that have sufficient disk and CPU resources to support multiple datasets • Skims in CSA06 were about ~500GB • The largest of the raw samples was ~8TB • Force the migration of analysis to Tier-2s by hosting data at Tier-2s ISGC 2007: CMS Computing
Transition to operations in 2007, Goals We plan to measure the transition to operations with concrete metrics Site availability: SAM tests (Site Availability Monitor) • Put CMS functions in the site functional testing • Analysis submissions • Production • Frontier • Data Transfer • Measure the site availability • The WLCG goal for the Tier-1 in early 2007 is 90% • We should establish a goal for Tier-2s, 80% seams reasonable • Goals for summer of 07 would be 95% and 90% respectively ISGC 2007: CMS Computing
Prepare CMS for Analysis: Summary • 2006 was a very successful year for CSM software and computing • 2007 promises to be a very busy year for Computing and Offline • Commissioning, Integration remains major task in 2007 • To balance the needs for physics, computing, detector will be a logistics challenge • Transition to Operations has started; data operations group formed • Facilities will be ramping up resources to be ready for pilot run and the 2008 physics run • An increased number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS analysis ISGC 2007: CMS Computing