1 / 21

CMS computing: model, status and plans

This article discusses the data processing model of CMS, including the different formats and volumes of data, the role of each tier in the processing and storage, and the current status and future plans.

attaway
Download Presentation

CMS computing: model, status and plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS computing: model, status and plans C. Charlot / LLR

  2. The problem: data volume • RAW • Detector data +L1, HLT results after online formatting • Includes factors for poor understanding of detector, compression, .. • 1.5MB/evt @ 150Hz  ~4.5PB/year (two copies, one distributed) • RECO • Reconstructed objects with their associated hits • 250kB/evt  ~1.5PB/year (including 3 reproc. versions) • AOD • The main analysis format: clusters, tracks, particle id, • 50kB/evt  ~2PB/year - whole copy at each T1 (e.g. CC-IN2P3) • TAG • High level physics objects, run info (event directory), <10kB/evt • FEVT • Bundling of RAW+RECO for distribution, storage • + MC data in estimated 1:1 ratio with experiment data C. Charlot, 2nd LCG-France Colloquium, mars 2007

  3. Data processing • We aim for prompt data reconstruction and analysis • Backlogs are the real killer • Prioritisation will be important • At the begining, computing system will not be 100% • Cope with backlogs without delaying critical data • Reserve possibility of ‘prompt calibration’ using low latency data • Streaming • Rule #1 of hadron collider physics: understand your trigger and selection is everything • LHC analyses rarely mix inclusive triggers • Classifying events early allows prioritisation • Crudest example: express-line of ‘hot’ / calib events • Propose o(50) ‘primary datasets’, immutable but • Can have overlapp (10% assumed) C. Charlot, 2nd LCG-France Colloquium, mars 2007

  4. Tier-0 Centre • Prompt reco (24/200), FEVT storage, data distribution • Provided by IT division • CPU: 4.6MSI2K, Disk: 0.4PB, MSS:4,9PB, WAN: >5Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

  5. Tier-1 Centres • Data storage, heavy processing (Re-Reco, skim, AOD extraction), raw data access, Tier-2 support • 7 Tier-1: ASCC, CCIN2P3, FNAL, GridKa, CNAF, PIC, RAL • Nominally, CPU: 2.5MSI2K, Disk: 1.2PB, MSS: 2.8PB, WAN: >10Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

  6. Tier-1 Centres • Analysis, MC production, specialised support tasks • Local + common use • Nominally, CPU: 0.9MSI2K, Disk: 0.2PB, No MSS, WAN: >1Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

  7. CMS-CAF • Latency critical services, analysis, Tier-1 functionality • CERN responsability, open to all collaborators • Roughly: Tier-1 MSS + 2 Tier-2 C. Charlot, 2nd LCG-France Colloquium, mars 2007

  8. Ressource evolution • We should be frightened by these numbers • Revised LHC planning • Keep integrated data volume ~same by increased trigger rate C. Charlot, 2nd LCG-France Colloquium, mars 2007

  9. Tier-1/Tier-2 Associations • Associated Tier-1: hosting MC prod + reference for AOD serving • Full AOD sample at Tier-1 (after T1T1 transfers for re-recoed AODs) • Stream “allocation” ~ available disk storage at centre CCIN2P3-AF, GRIF C. Charlot, 2nd LCG-France Colloquium, mars 2007

  10. Transfer Rates • These are raw rates: no catchup, no overhead • T1T1; total AOD size, replication period (currently 14 days) • T1T2: T2 capacity; refresh period at T2 (currently 30 days) • Average rate, worst-case peak for T1 is sum of T2 transfer capacities • Weighted by data fraction at T1 OPN in: FEVT (T0T1)+AOD (T1T1) OPN out: AOD (T1T1) T2 in: FEVTsim+AODsim (T2T1) T2 out: FEVT+AOD (T1T2) MB/s C. Charlot, 2nd LCG-France Colloquium, mars 2007

  11. Tier-0 Status (CSA06) • Prompt Reconstruction at 40 Hz • 50 Hz for 2 weeks, then 100 Hz • Peak rate: >300 Hz for >10 hours • 207M events total • Uptime: 80% of best 2 weeks • Achieved 100% of 4 weeks • Use of Frontier for DB access to prompt reconstruction conditions • The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software • Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA C. Charlot, 2nd LCG-France Colloquium, mars 2007

  12. Data Processing & Placement • Reminder: in CMS model, each Tier-1 gets only a fraction of total RAW+RECO • Chose Tier-1 destinations to meet analysis interest while not exceeding site storage capacity or bandwidth from Tier-0 Express C. Charlot, 2nd LCG-France Colloquium, mars 2007

  13. Tier-0Tier-1 Transfers • Goal was to sustain 150 MB/s to T1s • Twice the expected 40 Hz output rate Last week’s averages hit350MB/s (daily) 650MB/s (hourly)i.e. exceeded 2008 levels for ~10 days (with some backlog observed) Monthly T1 Transfer plot signals start Target rate Min bias only @ start T0 rate: 54 110 170 160 Hz C. Charlot, 2nd LCG-France Colloquium, mars 2007

  14. Tier-1 Transfer Performance goals • 6 of 7 Tier-1s exceed 90% availability for 30 days • U.S. Tier-1 (FNAL) hit 2x goal • 5 sites stored data to MSS (tape) C. Charlot, 2nd LCG-France Colloquium, mars 2007

  15. Tier-1 Skim Jobs • Tested workflow to reduce primary datasets to manageable sizes for analyses • Computing provided centralized skim job workflow at T1 • 4 production teams • Secondary datasets are registered into Dataset Bookkeeping Service and accessed like any other data • Common skim job tools prepared based on “MC Truth” or Reconstruction (both types tested) • Overwhelming response from CSA analysis demos • About 25 filters producing ~37 (+ 21 jet) datasets ! • Variety of output formats (FEVT, RECO, AOD, custom) • Selected events range from <1% to 100% (for Jets split) • Sizes range from <0.001 TB to 2.5 TB C. Charlot, 2nd LCG-France Colloquium, mars 2007

  16. Jobs Execution on the Grid • >50K jobs/day submitted on all but one day in final week • >30K/day robot jobs • 90% job completion efficiency • Robot jobs have same mechanics as user job submissions via CRAB • 2 submission teams set up • Mostly T2 centers as expected • OSG carries large proportion • Scaling issues encountered, but subsequently solved C. Charlot, 2nd LCG-France Colloquium, mars 2007

  17. CMS Tier-2: data transfers CSA06 GRIF 24 Tier-2 sites /CSA06-106-os-Jets0-0/RECO/CMSSW_1_0_6-RECO Fake rate e- from jets ~3TB C. Charlot, 2nd LCG-France Colloquium, mars 2007

  18. T1 Re-Reconstruction • Demonstrated re-reconstruction at T1 centers with access to offline DB using new constants • 4 teams set up to run 100K events at each T1 • Re-reconstruction demonstrated on >100K events at 6 T1s • 100% efficiency at CCIN2P3 (although small sample) • Initially ran into a problem with a couple reconstruction modules when first attempted • Had to drop pixel tracks and vertices out of ~100 modules due to technical issue with getting products stored in Event • For the Tracker and ECAL calibration exercises, new constants inserted into DB were used for re-reconstruction, and dataset published/accessed • Full reprocessing workflow! C. Charlot, 2nd LCG-France Colloquium, mars 2007

  19. 2007 MC production • 1_2_0 validation production completed • 03/07 prod for HLT (1_3_0) • 04-05/07 prod for physics (1_4_0) Stageout pbs C. Charlot, 2nd LCG-France Colloquium, mars 2007

  20. CMS Computing timeline 2007 • Computing support for 2008 papers preparation • Large scale MC production: march  may 2007 • Analysis  autumn 2007 • Core software final procedure and algos  autumn 2007 • Computing, Analysis and Software Challenge, CSA07 • Computing model at ~50% scale • Data production, distribution at Tier-1s • Skimming, re-reco at Tier-1, distribution to Tier-2 • Analysis at Tier-2s together with MC production •  july 2007 • Data taking: end 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

  21. Conclusions • CMS se prépare pour le data taking • Les activités au niveau du Tier-1 CC-IN2P3 vont se rencentrer sur ses missions premières • CSA07 est l’objectif no 1 du premier semestre • Également participation à la production MC • L’emphase porte maintenant sur les Tier-2 • Montée en puissance de GRIF • Production MC • Analyse locale • Tier-2 au CC-IN2P3 • Besoins importants pour l’analyse en Q2-Q3 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

More Related