Calcolo CMS

Calcolo CMS M. Paganoni CSN1, 18/9/07

Goals of Computing in 2007 • Support of global data taking during detector commissioning • commissioning of the end-to-end chain: P5 --> T0 --> T1s (tape) • data transfers and access through the complete DM system • 3-4 days every month starting in May • Demonstrate Physics Analysis performance using final software with high statistics (Validation) • Major MC production of up to 200M events started in March • Analysis starts in June, finishes by September • Ramping up of the distributed computing at scale (CSA07) • 50 % challenge of 2008 system scale • Adding new functionalities • HLT farm (DAQ storage manager -> T0) • T1 - T1 and non regional T1 - T2 • Increase the user load for physics analysis • The crucial part starts on September 24th

Data processing: roles of Tiers • T0 prompt reco, analysis/calibration object generation, … • Skimming • Data selection at T1s based on physics filters (followed by data transfer to T2s for analysis) • Exercised at large scale during CSA06 only (several filters running) • Re-reconstruction • Data reprocessing at T1s with access to new calibration constants • Successfully demonstrated during CSA06 (0.1-1 k reco evts/T1, reading conditions data from local FroNTier) • Re-processing-like DIGI-RECO MC production ramping up at T1s • Simulation • Submission of jobs (simulation code) at T2s • In the transient: also at T1s so far (to catch all available SL4 resources) • Analysis • Submission of jobs (analysis code) at T2s • Analysis users complemented by robotic submissions • In the transient: also at T1s so far (due to MC data placed there) T1 T2

Data processing: concepts and tools • Computing Model: a data-placement based approach • Jobs are sent where data is, no data transfer in response to jobs • Data placement system calls for data replication prior to processing (if needed) • Data directly read from storage system via posix-like I/O access protocols • No stage-in from local storage to WNs • Computing Model: organized vs ‘unpredictable’ processing • Scheduled data processing at T1s • Organized MC production at T2s • ‘Unpredictable’ user analysis at T2s • Tools • ProdAgent • unique CMS workflow management tool for MC production, data skimming and reprocessing (also reconstruction at Tier-0) • CRAB • CMS tool for data analysis

CSA07 workflow Prompt reco Data Transfer Skims, re-reco Calibration Data Transfer Data Transfer MC prod upload Skims download

Possible Analysis Workflows in CSA07 First pass at Tier0/CAF RAW RECO AOD RECO, AOD shipped at Tier1 Central analysis skims at Tier1 AOD + AOD Analysis algos Analysis skimoutput shipped at Tier2 Analysis Data Final analysis pre-selection at Tier2 Final samplesshipped at Tier3 Further selection, Reduced output AOD + Fewer AOD coll. Analysis Data Analysis Data fast processing and FWLite at Tier3

CMS planned a Computing Software and Analysis data challenge to test the status one year from data taking(Jun-Oct 07) All the goals are scaled by 50% wrt to the final ones 7 T1s ~ 40 T2s CSA07 goals

T0 -> T1 transfer Reliable tape archiving/retrieval still an issue

SAM for CMS: a screenshot  compile a site quality map  SAM outputs on the web Click for history Click for log file 80% 37% of sites over the 80% availability mark 62% of Tier-0/1 sites  define a metrics and construct an availability matrix

Data processing: 1 year • Currently: >20k jobs/day  Massive MC production  Constant, significant presence of analysis jobs • complemented by JobRobot-driven submissions  Middleware tests, CRAB Analysis Server tests, …  mw testing  CSA06 job submission (dominated by Job Robot)  Significant, constant user analysis More JobRobot…   Massive MC production for CSA07 Terminated jobs / 6 days

JobRobot • A simple wrapper around job submission system • Submits short analysis jobs using Grid (EGEE/OSG) • ~100/jobs/day/site to all CMS Tier-1/2’s • Web access to statistics and log files • Same data, same code, same users at all Tiers, every day • It’s flat, and tuned by robot managers • Standard candle for user’s analysis: “why then do those jobs fail?”

Loadtest07 Goal T2 -> * ~ 2.5TB/weekGoal * -> T2 > 12 TB/week CNAF->* *->CNAF

LNL->* * ->LNL Roma->* * ->Roma

Bari->* * ->Bari * ->Pisa Pisa->*

Produzione eventi MC CSA07

Profilo temporale MC CSA07

Numero eventi per team MC CSA07

Eventi per sito team LCG3

Profilo temporale team LCG3

Produzione eventi MC CSA07 Roma LNL Pisa Bari

Moving around MC events 65 Mevt/month (23 evt/s) 20 kjob/day Occupancy ~ 50 % Job eff ~ 80% data moved > 1PB/month

Utilizzo CNAF Problemi legati a: • CASTOR • mancanza disco • commissioning canali FTS CNAF

Risorse CNAF Nuove assegnazioni:65 TB: 1/3 su CASTOR (D0T1) e 2/3 su STORM (D1T0)

Stato Tier2 • Tutti i T2 di CMS sono passati a dCache nella primavera-estate 2007 • Il passaggio a SLC4 nei T2 si sta rivelando molto più complesso e lento del previsto (problemi di distribuzione software e manpower centrale di CMS) • Dopo l’upgrade di luglio, sollecitato da CMS, CASTOR (2.1.3) è più stabile come performance di base • optato per 1 sola service class nel transiente (per maggiore competività in import con FNAL) • Esposti problemi di ordine superiore (e.g. uso ottimale tapes in migrazione/recall) • Resta il principale potenziale show-stopper per le attività di CMS • Tutti i canali FTS da/verso il CNAF ai T2 INFN sono stati commissionati • >70% links da/verso CNAF comissionati anche nel progetto DDT

LNL

rack Knuerr Roma Il volano termico (3 T acqua) L’impianto di distribuzione dell’acqua chiller • I lavori hanno avuto piu’ di 2 mesi di ritardo • Lavori elettrici e idraulici finiti • Chiller, rack Knuerr e UPS installati e collegati • Manca il collaudo • Per questo motivo Roma farà CSA07 nella sede attuale e migrerà subito dopo • Le nuove CPU (8 quadcore) saranno comunque usate spegnendo macchine di altri esperimenti

Pisa

Bari • La Sezione INFN sta predisponendo la nuova sala CED • spostamento e riunificazione di tutte le risorse di calcolo della Sezione • superficie complessiva ~ 90 m2 • lo spazio all’interno della nuova sala CED è adeguato per ospitare un eventuale Tier2 (CMS+ALICE) • Stato lavori di adeguamento dei locali (impianti elettrico e antincendio, impianto distribuzione per raffreddamento rack): • Il CD del 20 Luglio 2007 ha deliberato l’iscrizione dei lavori nell’Elenco Annuale dei lavori pubblici dell’INFN per l’anno 2007 • avviata la gara per l’individuazione del progettista (conclusione prevista per Settembre) • completamento dei lavori di adeguamento previsto per i primi mesi del 2008 • UPS, Chiller, isola APC (6 rack + 4 refrigeratori) già acquisiti con fondi 2006 (da aggiungere a parte dell’infrastruttura in uso nell’attuale sala CED)

Ganglia Local monitoring EGEE official site: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.html

Links commissionati per CSA07Bari->CERN, Bari->CNAF, FNAL->Bari, FZK->Bari Stato di Bari al 10 Settembre 2007 Per commissionare un link serve trasferire 1.7TB alla settimana = (4*300Gbyte/day)+500Gbyte/day FNAL->Bari: 12TByte in 10 giorni Due link per direzione commissionati sin dall’inizio di CSA07

CSA06 (Sept/Oct 06): Total events processed >100M Performance (on 1kSi2k CPUs) ~ 25 s/ev on ttbar ~ 3 s/ev on minimum bias Memory tops at 500 MB/job after hours/thousands of events Crash rate < 10-7/event First definition of data Tiers (FEVT, RECO, AOD) Re-reconstruction, skimming demonstrated Software at CSA06/07 • CSA06 (Sept/Oct 07): • Includes also HLT and analysis skim workflows working from raw data • Extensive physics validation on Reconstruction code • Complete switch took 2 years • More realistic calibration flow • Enhanced software: • Tracking optimized • Electrons/photons optimal • Btagging, tau tagging • Many more jet algos • Lots of vertexing algos • Muons optimal • Particle Flow

Validation results Muon PT resolution PT = 10 GeV PT = 50 GeV PT = 100 GeV PT = 500 GeV PT = 1000 GeV Tracking Singlept 10 GeVfully efficient Tracker detector Electron classification Electron Energy

Validation Results Tau HLT Efficiencies (QCD and Z to tau tau) Btag efficiency Track counting Jet resolution standard/Particle Flow

A powerful visualization tool, iguana is already used for both commissioning, geometry checks and algorithm development Visualization A real cosmic muon seen while commissioning

HLT: 43 ms/event Ok with CMS TDRs Offline reconstruction Minbias ~3 s/ev QCD high Pt ~ 25 s/ev TDR numbers ~20 s/ev on todays machines (T = 25 kSi2k*s / P ) Simulation Still a factor ~2 more than in TDR (T = 90 kSi2k*s / P) Data sizes Currently Event size (FEVT) = 1.5 MB Reduced events: RECO: 500 kB AOD: 50 kB a factor 2 more than anticipated, but a lot of room for improvement Performance AOD (50 kB/ev) RECO (500 kB/ev) • Tracks RecHits 113 kB/ev • HCAL RecHits 42 kB/ev • ECAL RecHits 36 KB/ev • Calo Towers 26 kB/ev

Richieste 2008 • Le richieste finanziarie 2008 presuppongono che a Settembre si discuta per lo sblocco SJ (125 kEuro) • Chiediamo con forza di approvare Pisa come T2 per i meriti acquisiti sul campo • Il profilo temporale per il 2008 richiesto da CMS e’ di avere meta’ delle risorse per inizio aprile 2008 (convenzione WLCG). • La richiesta per il T1 e’ di seguire il “piano Forti”, visto che, tra cosmici e LHC, nel 2008 iniziera’ la presa dati reale

Commissioning Detector • TK: dati processati alla TIF nel 2007 (25 % Tracciatore) = 20 TB --> 100 TB • ECAL: • acquisizione dati con trigger cosmico (Global Running) • acquisizione dati in modalità locale ECAL con trigger cosmico • acquisizione dati in modalità locale ECAL con trigger di piedistallo, test-pulse o laser. Evento ECAL non zero-suppressed = 1.8 MB Evento CMS zero-suppressed = 1.5 MB (0.3 MB senza Tracciatore) Rate cosmici in caverna = 100 Hz Rate trigger locale ECAL = 5 Hz 6 TB/day per run globale con trigger cosmico 500 GB/day per run locale ECAL

Richieste Bari • SJ 2007 • 5 WN (75 kSI2K) [17,5 kE] • 2 server [7 kE] • 2008 • 12 WN (180 kSI2K) [45 kE] • 1 switch rete [10 kE] • 90 TB disco non SAN [135 kE]

Richieste Pisa • SJ 2007 • 20 TBN disco [30 kE] • Testa storage [15 kE] • 2008 • 22 WN (330 kSI2K) [80 kE] • 125 TBN disco SAN [190 kE + 40 kE] • 1 centro stella [25 kE]

Richieste Roma • SJ 2007 • 1 testa storage [15 kE] • 1 switch centro stella [3.5 kE] • 2008 • 25 WN (375 kSI2K) [112 kE] • 100 TBN disco SAN + server controllo[150 kE + 25 kE] • 1 centro stella a 10 Gb/s [25 kE] • Dispositivi di controllo remoto[10 kE]

Richieste Legnaro • SJ 2007 • 11 WN (165 kSI2K) [45 kE] • 2008 • 27 WN (400 kSI2K) [100 kE] • 120 TBN disco SAN [180 kE + 25 kE] • 8 server per servizi [28 kE] • 1 centro stella [40 kE] • 2 switch di rack [20 kE]

Riassunto richieste 2008 15 KSi2K/WN (dual cpu quad core) 0.25 E/SI2K 1.5 E/GB 5 kE/server (1 ogni 20 TB) per SAN

Richieste CPU 2008

Richieste storage 2008

Responsabilita’

Calcolo CMS

Calcolo CMS

Presentation Transcript

Evoluzione dei modelli di calcolo distribuito nell’esperimento CMS

Calcolo @ CDF

Calcolo CDF

Calcolo LHC: CMS, progressi nel 2003 e prospettive

CALCOLO SCIENTIFICO (PARALLELO)