1 / 26

The CMS Computing System: getting ready for Data Analysis

Matthias Kasemann CERN/DESY. The CMS Computing System: getting ready for Data Analysis. CMS achievements 2006. Magnet & Cosmics Test (August 06) Detector Lowering (January 07). CMS achievements 2006 : Physics TDRs.

kerem
Download Presentation

The CMS Computing System: getting ready for Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matthias Kasemann CERN/DESY The CMS Computing System:getting ready for Data Analysis

  2. CMS achievements 2006 Magnet & Cosmics Test (August 06) Detector Lowering (January 07) ISGC 2007: CMS Computing

  3. CMS achievements 2006: Physics TDRs • Feb 2006: Volume I of the P-TDR; describes detector performance and software. • Jun 2006: Volume II describes the physics performance. • The two volumes constitute the culmination of our plans for data analysis in CMS with up to 30 fb-1 of data. • The special study of detector commissioning and data analysis during the startup of CMS, has been deferred to 2007. • This activity mobilized hundreds of collaborators during the past two years, and many useful lessons have been learned. ISGC 2007: CMS Computing

  4. CMS: Computing highlights 2006 • Main computing/software milestones: • Magnet Test Cosmic challenge (Apr 06) • Computing Software and Analysis Challenge 06 (Nov 06) • 2006: a year of fundamental software changes • New simulation and reconstruction software packages released • Very positive feedback from users • Developed procedures for release integration, building and distribution. • Control release tools, Hypernews, Nightly builds, Tag collector, WorkBook,… • Design control of all interfaces and data formats in place • CMSSW framework, framework-light, ROOT available for data access • Integration with CMS detector and commissioning activities • Strong connections with various detector groups – key for commissioning • Validation software packages and validation procedure in place – crucial for startup preparation ISGC 2007: CMS Computing

  5. Major Milestone in 2006: CSA06 • Combined Computing, Software, and Analysis challenge (CSA06) • A “25% of 2008” data challenge of the CMS data handling model, computing operations • Integrated test of full end-to-end chain of the complete system, from (simulated) raw data to analysis at Tier-1 and Tier-2 centers. • Launched on Oct 2, 2006; many months of preparation and following the development of about 0.5M lines of software in the new CMSSW framework. • 6 weeks later having achieved all technical goals of the challenge. Code ran with negligible crash rate, without any memory problems on all samples • By the end of CSA06: Tier-0 centre reconstructed > 200M events; >1 Petabyte of data shipped across network between Tier-0, Tier-1, and Tier-2 centers. • Excellent collaboration with IT department was an important factor in the success of the challenge • World-wide distributed system of regional Tier1 and Tier2 centers ISGC 2007: CMS Computing

  6. CSA06: T0 Goals & Achievements • Prompt Reconstruction at 40 Hz • 50 Hz for 2 weeks, then 100 Hz • Peak rate: >300 Hz for >10 hours • 207M events total • Uptime: 80% of best 2 weeks • Achieved 100% of 4 weeks • Use of Frontier for DB access to prompt reconstruction conditions • The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software • Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA • CPU use • Max CPU efficiency: 96% of 1400 CPUs over ~12 hours • Explored realistic T0 operations, upgrading and intervening on a running system ISGC 2007: CMS Computing

  7. CSA06: T0T1 Transfers Last week’s averages hit350MB/s (daily) 650MB/s (hourly)i.e. exceeded 2008 levels for ~10 days (with some backlog observed) • Goal was to sustain 150 MB/s to T1s • Twice the expected 40 Hz output rate Monthly T1 Transfer plot signals start Target rate Min bias only @ start T0 rate: 54 110 170 160 Hz ISGC 2007: CMS Computing

  8. CSA06: Individual T0 - T1 Performance Goals Achievements • 6 of 7 Tier-1s exceed 90% availability for 30 days • U.S. T1 (FNAL) hit 2X goal • 5 sites stored data to MSS (tape) ISGC 2007: CMS Computing

  9. CSA06: Jobs Execution on the Grid • > 50K jobs/day submitted on all but one day in final week • > 30K/day robot jobs • 90% job completion efficiency • Robot jobs have same mechanics as user job submissions via CRAB • Mostly T2 centers as expected • OSG carries large proportion • Scaling issues encountered, but subsequently solved ISGC 2007: CMS Computing

  10. CSA06: Prompt Tracker Alignment • Determine new alignment: • Run “HIP” algorithm on multiple CPUs at CERN over dedicated alignment skim from T0 • 1 Million events ~4h on 20CPU • Write new alignment into offline • DB at T0 (ORCOFF) • distribute offline DB to T1/T2’s TIB DS modules - positions results 2 days after AlCaReco! Closing the loop: analysis of re-reconstructed Z  m+m- data at T1/T2 site: Three scenarios: Ideal/misaligned/realigned (grid jobs at T1-PIC) ISGC 2007: CMS Computing

  11. 1 GLB + 1 tracker track 2 GLB tracks 1 GLB + 1 STA track CSA07: Physics Analysis Demonstrations • These demonstrations proved to be useful training exercises for collaborators in the new software and computing tools. • Muon: • Extraction of W • Di-Muon reconstruction efficiency • Z, J/+- • Northwestern and Purdue groups and T2 activity • Tau: • Selection of Ztau tau l+jet • Tau mis-id study from Z+jet • Tau tagging efficiency ISGC 2007: CMS Computing

  12. CSA06 Summary • All goals were met • T0 prompt reconstruction of RECO, AOD, AlCaReco, and with Frontier access @100% efficiency for 207M events • Export to T1 @ 150 MB/s and higher • Data reduction (skim) production at T1s performed, transferred to T2s • Re-reconstruction demonstrated at 6 T1 centers • Job load exceeded 50K/day • Alignment/Calibration/Physics analyses widely demonstrated • CSA06 was a huge enterprise • Commissioned the CMS data-handling workflow @ 25% scale • Everything worked down to the final analysis plots • Many lessons can be drawn for the future as we prepare for data-handling operations, and more things to commission • DAQ Storage Manager  T0 • Support of global data-taking during detector commissioning ISGC 2007: CMS Computing

  13. Some Lessons from CSA06 • CMS needs some development work to ease the operations load • Strong engagement with OSG, WLCG and sites was extremely useful • Grid service and site problems were addressed promptly.   • FTS at CERN was carefully monitored, response when needed • CASTOR support at CERN was excellent • Support from CERN IT was key for success and very instrumental • Data management needs an automatic way to ensure consistency across all components • Scale testing continues to be an extremely important activity ISGC 2007: CMS Computing

  14. CMS Outlook and Perspectives for 2007 • Lower all the detector, and commission it underground. • Prepare final distributed computing and software system and physics analysis capability. • Initial* CMS detector will be ready for collisions at 900 GeV at the end of 2007. • Low luminosity detector will be ready for collisions at design energy in mid-2008. • Initial* CMS detector is the low luminosity detector minus ECAL endcaps and pixels. Install both during 07/08 winter shutdown. ISGC 2007: CMS Computing

  15. CMS computing goals in 2007 • Demonstrate Physics Analysis performance using final software with high statistics. • Major MC production of up to 200M events started last week • Analysis starts in June, finishes by September • Regular data taking: Detector – HLT – TAPE - T0 - T1 • At regular intervals, 3-4 days per months, starting May • Month of October: MTCC3 Readout of (successively more) components, data will be processed and distributed to T1 ISGC 2007: CMS Computing

  16. Computing Commissioning Plans 2007 Start large MC Production • February • Deploy PhEDEx 2.5 • T0-T1, T1-T1, T1-T2 independent transfers • Restart job robot • Start work on SAM • FTS full deployment • March • SRM v2.2 tests start • T0-T1(tape)-T2 coupled transfers (same data) • Measure data serving at sites (esp. T1) • Production/analysis share at sites verified • April • Repeat transfer tests with SRM v2.2, FTS v2 • Scale up job load • gLite WMS test completed (synch. with Atlas) • May • Start ramping up to CSA07 • July • CSA07 Event Filter tests Start Analysis Start Global data-taking runs preCSA07 CSA07 GlobalDetector Run LHC Eng. run ISGC 2007: CMS Computing

  17. Motivations for CSA07 There are two important goals for 2007, the last year of preparations for physics and analysis 1) Scaling We need to reach 100% of system scale and functionality by spring of 2008 • CSA06 demonstrated between 25% and 50% depending on the metric 2) We need to transition to sustainable operations This spans all areas of computing • Data management • Job processing • User Support • Site configuration and consistency In the past functionality was valued higher than the operations load • As we prepare for long term support this emphasis needs to change ISGC 2007: CMS Computing

  18. CSA07 Goals: Increase Scale CMS demonstrated 25% performance in 2006. We have two more factors of 2 to ramp up before data taking in 2008 • The data transfer between Tier-0 and Tier-1 reached about 50% of scale • Very successful test, but some signs of system stress were visible • Job submission rate reached 25%. We plan another formal challenge in 2007 • A > 50% challenge in the summer of 2007 • Extend the system to include the HLT farm • Add elements like simulation production • Increase user load • Run concurrent with other experiments stressing the system ISGC 2007: CMS Computing

  19. CMS Computing Model & Resources CMS Tier-1 centers: ISGC 2007: CMS Computing

  20. CSA07 Workflow ISGC 2007: CMS Computing

  21. CSA07 success metrics ISGC 2007: CMS Computing

  22. CSA07 Goals for Tier-1s In the Computing Model the Tier-1 centers perform 4 functions: • Archive Data, both real and simulation from Tier-2 centers • Execute skimming and selection for users and groups on the data • Re-reconstruction of raw data • Serving data samples to Tier-2 centers for further analysis As we transition to operations we should bring the Tier-1 centers into alignment with their core functionality ISGC 2007: CMS Computing

  23. CSA07: expectations of Tier-2s MC Production at Tier-2s • were a significant contributor to the 25M events/month for CSA06 • When the experiment is running the Tier-2s are the only dedicated simulation resources and the expectations is 100M per month • Now CMS produces 30M events/months, goal for CSA07 is 50M Analysis submission • The Tier-2s are expected to support communities • Either local groups or regions of interest • Only implemented in a couple of specific communities • Unlike Tier-1 data subscriptions and processing expectations, which are largely specified by the experiment centrally, the Tier-2s have control over the data and the activity CMS will work to improve the reliability and availability of the Tier-2 centers ISGC 2007: CMS Computing

  24. Tier-2 Analysis goals in 2007 Tier-2s are the primary analysis resource controlled by physicists • The activities are intended to be controlled by user communities Up to now most of the analysis has been hosted at the Tier-1 sites CMS will enlarge analysis support by hosting important physics samples exclusively at Tier-2 centers • We have roughly 10-15 sites that have sufficient disk and CPU resources to support multiple datasets • Skims in CSA06 were about ~500GB • The largest of the raw samples was ~8TB • Force the migration of analysis to Tier-2s by hosting data at Tier-2s ISGC 2007: CMS Computing

  25. Transition to operations in 2007, Goals We plan to measure the transition to operations with concrete metrics Site availability: SAM tests (Site Availability Monitor) • Put CMS functions in the site functional testing • Analysis submissions • Production • Frontier • Data Transfer • Measure the site availability • The WLCG goal for the Tier-1 in early 2007 is 90% • We should establish a goal for Tier-2s, 80% seams reasonable • Goals for summer of 07 would be 95% and 90% respectively ISGC 2007: CMS Computing

  26. Prepare CMS for Analysis: Summary • 2006 was a very successful year for CSM software and computing • 2007 promises to be a very busy year for Computing and Offline • Commissioning, Integration remains major task in 2007 • To balance the needs for physics, computing, detector will be a logistics challenge • Transition to Operations has started; data operations group formed • Facilities will be ramping up resources to be ready for pilot run and the 2008 physics run • An increased number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS analysis ISGC 2007: CMS Computing

More Related