1 / 36

CMS Experience on the GRID

CMS Experience on the GRID. Peter Kreuzer for the CMS Computing project RWTH Aachen / CERN. NEC2009 Conference Varna / Bulgaria , Sep 10, 2009.

zandra
Download Presentation

CMS Experience on the GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS Experience on the GRID Peter Kreuzer for the CMS Computing project RWTH Aachen / CERN NEC2009 Conference Varna / Bulgaria , Sep 10, 2009

  2. No need to provide the scientific motivation here, since Sergio Cittolin and Livio Mapelli did this very well in the introductory session of this conference!

  3. Goal of the CMS Computing project • Provide Resourcesand Services to store/serve O(10) PB data/year • Provide access to most interesting physics events to O(1500) CMS collaborators located in 200 institutions around the world • Minimize constraints due to user localisation and resource variety • Decentralize control and costs of computing infrastructure • Share resources with other LHC experiments • Find the answer on the Worldwide LCG GRID Up to 200 Tracks / Event After Reconstruction / Selection

  4. The way CMS uses the GRID CAF WLCG Computing Grid Infrastructure 450MB/s (300Hz) CMS detector 30-300MB/s (ag. 800MB/s) ~50k jobs/day 100MB/s 50-500MB/s 10-20MB/s ~150k jobs/day TIER-3 TIER-3 TIER-3 TIER-3 CMS in Total: 1 Tier-0 at CERN (GVA) 7+1 Tier-1s on 3 continents 50 Tier-2s on 4 continents Tier-0 Prompt Reconstruction Archival of RAW and First RECO data Calibration Streams (CAF) Data Distribution  Tier-1 7+1Tier-1s Re-Reconstruction Skimming Custodial Copy of RAW, RECO Served Copy of RECO Archival of Simulation Data Distribution  Tier-2 ~50 Tier-2s Primary Resources for Physics Analysis and Detector Studies by users MC Simulation  Tier-1

  5. Computing Resources: Setting the scale • Run 2009-10 (Oct-Mar/Apr-Sep) 300 Hz / 6 x 106 sec / 2.2 x 109 events • Size&CPU per event • CMS datasets • O(10) Primary Datasets (RAW, RECO, AOD) • O(30) Secondary Datastets (RECO, AOD) • 1.5 times more Simulated (SIMRAW, SIMRECO, AOD) • During run 2009-10, CMS plans • 5 full re-Reconstructions, need : • 400 kHS06 CPU • 26 PB disk • 38 PB tape • Resources ratio CERN / (T1+T2) : • CPU : 25% , Disk : 15% • HEP-Spec2006 : • Modern CPU ~ 8 HS06 / core • 100 HS06-sec ~ 12.5 sec/event • 100 kHS06 ~ 12,500 cores disk [PB] / CPU [kHS06]

  6. And not to forget the obvious... Networking • GRID Middleware Services • - Storage Elements • Computing Element • Workload Management • System • Local File Catalog • Information System • Virtual Organisation • Management Service • Inter-operability between • GRIDs EGEE, OSG, NorduGriD.. Site Specificities, e.g. Storage/Batch systems at CMS Tier-1s: FNAL RAL CCIN2P3 PIC ASGC INFN FZK Storage : dCache/ Castor dCache/HPSS dCache/ Castor Castor+ dCache/TSM Endstore Endstore Storm Batch : Condor Torque/Maui Torque/Maui BQS Torque/Maui LSF PBSPro

  7. Russian/Ukrainian CMS T2s, CERN-T1 • 2006 : agreement between WLCG and CMS managements on association of RU/UA T2 sites with a partial-T1 at CERN • 2008 : 30TB disk buffer in front of 450TB tape space deployed at CERN to serve AODs to and archive MC data from RU/UA T2s • RU/UA T2s providing WLCG/CMS service work, e.g Transfer Link Commissioning and Data Management support • CMS established strong collaboration and weekly Operations meetings with RU/UA T2 community. A lot of progress made since 1 year ! • As of today, following active RU/UA CMS sites : • And many participating physics communities, like Bulgaria ! T2_RU_IHEP (Protvino) T2_RU_PNPI (Petersburg Nucl. Ph. Inst.) T2_RU_RRC_KI (Kurchatov Inst.) T2_RU_INR (Inst. for Nucl. Res. RAS) T2_RU_JINR (Dubna) T2_RU_ITEP (R.F. Inst. for Th. and Exp. Ph.) T2_RU_SINP (MSU Develop. site) T2_UA_KIPT (NSC Kharkov Inst. of Ph. & Tech.)

  8. In the rest of this talk I will cover • Snapshots of achieved CMS computing milestones : • Data Distribution • DataProcessing • Distributed Analysis • A few more aspects related to CMS Computing Operations on the GRID : • SiteAvailability & Readiness monitoring • SW Deployment • Computing shiftprocedures

  9. Acronyms of CMS Computing activities

  10. Data Transfers • PhEDEx: distributed set of perl agents on top of large ORACLE DB • CMS Data Distribution strategy relies on Full Mesh Data Management • Links individually „commissioned“ • Target rates depend on size of sites • Minimum requirement to sites • T1 : 4 links to T1s / 20 downlinks to T2 • T2 : 2 uplinks / 4 downlinks to/from T1 • CMS has >600 commissioned links ! 100 TB/day 200 TB/day CMS Data transer team • Historical view : gained 3 orders of magnitude in transfer volume since 04 • Main milestone (CCRC08) achieved in parallel to other LHC VOs • Reached expected burst level of first year LHC running

  11. Data Transfer challenges Tier-0  Tier-1 Throuput (MB/s) (Multi-VO, 05/01/2008 – 06/02/2008) • T0 – T1 • CMS regularly over design rate in CCRC08 multi-VO challenge • STEP09 : included tape writing step at T1s : observed T0➞T1 transfer latency impacted by T1 tape system state (busy, overloaded, … ) • T1 – T1 • STEP09 : simultaneous transfer 50TB AOD from 1 T1 to all T1s  average 970MB/s (3 days), no big problems encountered • T1 – T2 • See „T1 Data Serving tests in STEP09“ below

  12. Data Transfers from/to RU/UA T2s • T2 T1 transfers of produced MC data • 32.7 TB events since 1 year • from 6 different T2s to custodial T1 • Transfer Qualilty to/from RU/UA T2s • improved lately • T1  T2 downloads : 73 TB to • RU/UA T2s since 1 year

  13. Production of simulated data • Good measure of Computing performance on the GRID (CMS activity since 2004) • ProdAgent : modular Python deamons on top of MySQL DB, interacting with CMS DM systems • CMS has a centralized production strategy handled by the Data Operations teams • 2006-07 : 6 „regional teams“ + Production Manager • 2008-09 : 1 team (6 people) managing submissions in defined „T1-regions“ CMS DataOps Production Team

  14. Resource Utilization for Production 5,000 slots • Summer 2007 : • <4700 slots> utilized • including 32 T1s and T2s •  75% of total available then • Summer 2009 : • <9-10,000 slots> utilized • including 53 T2s and T3s • 60% of total T2 available then • rest of T2 resources used by analysis • Note : besides the scale, the other important performance metric is to reach a sustained site occupancy 10,000 slots

  15. Summer 09 Production at RU/UA T2s Total Pledge (June 09) • So far produced around 33M events at all RU/UA Tier-2 sites • This was possible thanks to a strong effort by site coordinators (V. Gavrilov, E. Tikhonenko, O. Kodolova) and local site admins to • satisfy the CMS Site Availability and Readiness requirements • Note: total pledge for RU/UA job slots for CMS : 807 (June 09)

  16. Primary Reconstruction at Tier-0 • CMS T0 mainly operated from CMS Centre (CERN) and FNAL ROC CRAFT08 270 M events Tier-0 jobs (running , pending) CRAFT09 320 M events • Despite lack of collision data, CMS able to commission T0 workflows, based on cosmic ray data taking at 300HZ : • – Repacking – reformatting raw data, splitting into primary datasets • – Prompt Reconstruction – first pass with a few days turnaround • – Automated subscriptions to the Data Management system • – Alignment and Calibration data skimming as input to the CAF

  17. Primary Data Archival at Tier-0 • Transfer RAW Data from the Detector and archive it on tape at CERN CCRC08 – Phase 1 STEP09 1.4 GB/s CMS Tier-0 Operations team • Routinely held archival performance > nominal rate 500 MB/s • Until LHC startup, will test the T0 performance based on simulated • collision-like data

  18. Data Reprocessing, Data Serving at T1s CMS Data Distribution Model puts a heavy load on Tier-1s : • Disk and tape storage Capacity • Custodial copy of Rec. Data (fraction) & archival of Simulated Data • Full set of Analysis Object Data - AOD (subset for 90% analyses) • Non-custodial copy of RECO/AOD encouraged • Processing Capacity • Re-Reconstruction • Skimming of data to reduce the data size samples • Tape I/O bandwith • Reading many times to Serve data to T2s for analysis • Writing for storage / Reading for Re-Reconstruction if not on disk • Full mesh (T1 – T2) strategy • Pre-Staging strategy ? • Other strong requirements • 24/7 coverage and high (98%) availability (WLCG) • CMS Data Operations at Tier-1s • Central operations or specialized workflows, no user access

  19. Routine Re-Reconstruction operations • CMS routinely reprocessed Simulated Data (CSA08, ...) and Cosmic Data (CRAFT08,09, ...) at all Tier-1s • CPU performances according to CMS pledges • Using both GLite/WMS (Bulk) and GlideIn submissions • Regular Data Back-fill jobs to test site reliability • Example below : Transfer / Re-Reconstruction of CRAFT08 data 500 TB CMS DataOps reprocessing team

  20. Tier-1 Pre-Staging tests in STEP09 • Rolling Re-Reconstruction exercise : Pre-stage / Process / Purge data at all T1s during 2 weeks Processing paused at 1 T1 due to staging backlog • Tape system at almost all T1s demonstrated pre-staging capability above required rate. To be re-done in Fall 09 if needed • Re-Reconstruction CPU efficiency significantly better with pre-staging . CMS considering an integrated pre-staging solution.

  21. Tier-1 Data Serving tests in STEP09 • T1  T2 Data serving exercise: transfer from T1 tapes to T2s, i.e. put load on T1 tape-recall, not WAN rate to T2s • Latency performance : overall result ok • Punctual weaknessees at some T1 during test, to be investigated before LHC startup

  22. Analysis Model in CMS Analysis in CMS is performed on a globally distributed collection of computing facilities CERN CAF Tier-0 Several Tier-1s have separately accounted analysis resources Tier-2 CAF Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Computing Facilities are half devoted to simulation half user analysis. Primary Resource for Analysis Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-3 Tier-3 Computing Facilities are entirely controlled by the providing institution used for analysis Tier-3 Tier-3 Tier-3 Tier-3 Tier-3 Tier-3

  23. Eventually output to Temporary Space Analysis tool and users on the GRID CMS Remote Analysis Builder (CRAB) Work on Data Integrity Local dCache Continued work on scaling and submission Grid Failures Global Tier-2s CRAB Access to Official Data Job Specifications CRAB Server Job Status Remote Stage-out Problems with Reliability and Scale CRAB Access to User Data Files Registered in Data Management Small Data Products • Since 2008 : >1000 distinct CMS Users • 100,000 analysis jobs/day in reach Local Tier-2 Output to User Space

  24. CMS Tier-2 Disk Space management • In CMS jobs go to the data : distribute data broadly • CMS attempts to share management of space across Analysis Groups • Ensures people doing the work have some control • 200TB of disk space at a nominal Tier-2 • 20 x 1TB is identified for storing local user • produced files and making them grid accessible • 30TB is identified for use by the local group • 2-3 x 30 TB reserved to CMS PH Analysis groups • 30 TB for centrally managed Analysis Operations • expect to be able to host most RECO data in 1sr y. • 20 TB of space for DataOps for MC staging buffer

  25. Analysis Groups associations to T2s - Currently 17 physics analysis / detector performance groups - Association improved communication with sites and user involvment • Example: • Current associations of • RU/UA T2s to PH groups: • exotica, heavy ion, • jetmet, muon

  26. Group Space, Private Skims, Priority Access @ Tier-2s • Many groups do not yet use the space assigned to them • CMS analysis groups or users are producing „private“ skims • Data written in private /store/user area • New tool to „promote“ private skims to the full collaboration • validate skims and “migrate” skims to public /store/results area • Priority Queues • 2 people per Analysis Group to access 25% of pledged CPU at assoc. T2 % of used Group Space Group Space pledged at T2s The CMS Analysis Operation team

  27. Job Slot utilization for Analysis Production Analysis • Current CMS total CPU pledge at T2s : 17k jobs slots • Analysis pledge : 50% • Utilization in August was reasonable •  but need to go into sustained analysis mode

  28. Analysis Success Rates • Need to understand difference • In particular Application failures • And convince majority of CMS physicists to favor Distributed Analysis Not well understood Application failures: stage out, ... Mainly Read failures Some Grid failures

  29. CMS Site Commissioning • Objectives • Test all functionality required from CMS at each site in a continuous mode • Determine if the site is usable and stable • What is tested? • Job submission • Local site configuration and CMS software installation • Data access and data stage-out from batch node to storage • “Fake” analysis jobs • Quality of data transfers across sites • What is measured? • Site availability: fraction of time all functional tests in a site are successful • Job Robot efficiency: fraction of successful “fake” analysis jobs • Link quality: number of data transfer links with an acceptable quality • What is calculated? • A global estimator which expresses how good and stable a site is

  30. Statistics and plots Site summary table Site ranking Site history The CMS Site Commissioning team

  31. How can this be used? • To measure global trends in the evolution of the reliability of sites • Impressive results in the last year • Weekly reviews of the site readiness • Production teams can better plan where to run productions • Automatically map to production and analysis tools ? The CMS Site Commissioning team

  32. CMS Software Deployment Basic strategy: Use RPM (with apt-get) in CMS SW area The CMSSW Deployment teams  Deployment of CMS SW to 90% sites within 3 hours

  33. CMS Experiment Control Room CMS Remote Operations Centre at Fermilab CMS Centre at CERN: monitoring, computing operations, analysis CMS Centers and Computing Shifts • CMS running Computing shifts 24/7 • Encourage remote shifts • Main task: monitor and alarm CMS sites & Computing Experts

  34. Conclusions • In last 5 year CMS gained a very large expertise in GRID Computing Operations and is in good state of readiness • Dedicated Data Challenges and Cosmic data taking very valuable, however not quite „the real thing“ which is : • Sustained data processing • Stronger demand on site readiness • Higher demand on data accessibility by physicists • Left over program until the LHC startup • Tier-0 : repeat scale tests using simulated collision-like events • Tier-1 : repeat STEP09 tape and processing exercises where needed + add automation („WMBS“) to T1 processing • Tier-2 : support and improve distributed analysis efficiency • Review Critical Services coverage • Fine tune Computing Shifts procedures • Make sure pledged site resources for 2010 become available

  35. Backup

  36. Scientific Motivation • L1 Trigger output : •  100KHz • After HLTrigger : • O(300Hz) • Storage rate : • O(500MB/s) 20 collisions/Bx (~1 PB/s)

More Related