380 likes | 541 Views
ATLAS Distributed Computing. Kors Bos Annecy, le 18 Mai 2009. ATLAS Workflows. Calibration & Alignment Express Stream Analysis. Prompt Reconstruction. Tier-0. CAF. CASTOR. RAW Re-processing HITS Reconstruction. Tier-1. Tier-1. Tier-1. Tier-2. Tier-2. Tier-2. Simulation Analysis.
E N D
ATLAS Distributed Computing Kors Bos Annecy, le 18 Mai 2009
ATLAS Workflows Calibration & Alignment Express Stream Analysis Prompt Reconstruction Tier-0 CAF CASTOR RAW Re-processing HITS Reconstruction Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Simulation Analysis Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2
At the Tier-0 RAW, Data from the detector 1.6 MB/ev ESD, Event Summary Data 1.0 MB/ev AOD, Analysis Object Data 0.2 MB/ev DPD, Derived Physics Data 0.2 MB/ev TAG, Data tag 0.01 MB/ev
From the detector Data Streams Runs and RAW Merging A start/stop is between 2 Luminosity Blocks ~30 seconds file All files in a run dataset 200 Hz for 30’ is 6000 events but split between ~10 streams Streams are unequal and some create too small files Small RAW files are merged into 2 GB files Only merged files are written to tape and exported Physics streams • egamma • muon • Jet • Etmiss • tau • Bphys • minBias Calibration streams • Inner Detector Calibration Stream • Contains only partial events • Muon Calibration Stream • Contains only partial events • Analyzed outside CERN • Express line • Full events, 10% of data
Calibration and Alignment Facility CAF • Per run .. • Express line used for real-time processing • Initial calibration used • verified by DQ shifters • Calibration data processed in CAF • Initial calibrations used • New calibrations into offline db • Express line processed again • New calibrations used • Verified by DQ shifters • If necessary fixes applied • Express line processed again if necessary • Buffer for several days of data • Reconstruction of all data triggered • Results archived on tape. and • Made available at CERN, and • Replicated to other clouds
Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)
A T L A S STEP09 A Functional and Performance Test For all 4 experiments simultaneously
What we would like to test • Full computing model • Tape writing and reading simultaneously in Tier-1’s and Tier-0 • Processing priorities and shares in Tier-1 and -2’s • Monitoring of all those activities • Simultaneously with other experiments (test shares) • All at nominal rates for 2 weeks: June 1 - 14 • Full shift schedule in place like for cosmics data taking • As little disruptive as possible for detector comissioning
Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)
The Common Computing Readiness Challengeof last year 12h backlog Fully Recovered in 30 minutes All Experiments in the game MB/s Subscriptions injected every 4 hours and immediately honored T0->T1s throughput MB/s
Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)
2. Detector data re-processing (in the Tier-1’s) • Each Tier-1 responsible to re-process its share • Pre-stage RAW data back from tape to disk • Re-process reconstruction (on average 30’ per event) • Output ESD, AOD, DPD archived to tape • Copy AOD and DPD to all other 9 Tier-1’s • Distribute AOD and DPD over Tier-2’s of ‘this’ cloud • Copy ESD to 1 other (sister) Tier-1
Re-processing work flow Here means mAODmDPD merged AOD/DPD files
Spring09 re-Processing Campaign • Total input data (RAW) • 138 runs, 852 containers, 334,191 files, 520 TB • https://twiki.cern.ch/twiki/pub/Atlas/ DataPreparationReprocessing/reproMarch09_inputnew.txt • Total output data (ESD, AOD, DPD, TAG, NTUP, etc.) • 12,339 containers, 1,847,149 files, 133 TB • Compare with last time - 116.8TB - due to extra runs, DPD formats etc
Simplified re-processing for STEP09 • Spring09 campaign too complicated • Simplify by just running RAWESD • Using Jumbo tasks • RAW staged from tape • ESD archived back onto tape • Volume is smaller than with real data • Increase Data Distribution FT • To match the missing AOD/DPD traffic
Re-Processing targets • Re-processing at 5x the rate of nominal data taking • Be aware: ESD is much smaller for cosmics than for data • ESD file size 140 MB i.s.o. 1 GB
Tier-1 Tier-1 Volumes and Rates • Re-processed data distributed like original data from Tier-0 • ESD to 1 partner Tier-1 • AOD and DPD to all other 9 Tier-1’s (and CERN) • and further to the Tier-2’s • AOD and DPD load simulated through DDM FT
Tier-1 Tier-2 Volume and rates • Computing Model foresaw 1 copy of AOD+DPD per cloud • Tier-2 sites very hugely in size and many clouds export more than 1 copy
Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)
G4 Monte Carlo Simulation Production G4 simulation takes ~1000 s/event digi+reco takes ~20-40 s/event EVNT = 0.02 MB/event HITS = 2.0 MB/event RDO = 2.0 MB/event ESD = 1.0 MB/event AOD = 0.2 MB/event TAG = 0.01 MB/event
MC Simulation Production statistics • Only limited by requests and disk space
G4 Simulation Volumes • Mc09 should have started during STEP09 • Exclusively run on Tier-2 resources • Rate will be lower because other activities • Small HITS files produced in Tier-2’s uploaded to Tier-1 • Merged into Jumbo HITS and written to tape in Tier-1 • Merged MC08 data from tape will be used for reconstruction • AOD (and some ESD) written back to tape and distributed like data
Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)
5. User analysis • Mainly done in Tier-2’s • 50% capacity should be reserved for user analysis • We already see 30% activity at least in some site • In addition some Tier-1 sites have analysis facilities • Must make sure does not disrupt scheduled Tier-1 activities • We will also use HammerCloud analysis test framework • Contains 4 different AOD analyses • Can generate constant flow of jobs • Uses both WMS and PanDA back-ends in EGEE • Tier-2’s should install the following shares:
Putting it all togetherTier-1 Volumes and Rates for STEP09 • For CCIN2P3: • ~10 TB for MCDISK and ~200 TB for DATADISK and ~55 TB on tape • ~166 MB/s data in and 265 MB/s data out (!?)
Tape Usage For CCIN2P3 • Reading: 143 MB/s • RAW for re-processing • Jumbo HITS for reconstruction • Writing: 44 MB/s • RAW from the Tier-0 • (Merged) HITS from the Tier-2’s • output from re-processing (ESD, AOD, DPD, ..) • output from reconstruction (RDO, AOD, ..)
Nous vous demandons .. • Que les sites vérifient les chiffres et les dates • nous savons que CCIN2P3 ne peut pas faire pre-staging automatique • C’est prévu d’avoir suffisamment de refroidissement début Juin? • Combien est la capacité des Tier-2’s dans le nuage Français? • Qu’ils y a une personne/des personnes qui observent • Sur chargement de systèmes, ralentissements, erreurs, .. • Saturations des bandes de passages • Au moins 1 personne par site, Tier-1 et Tier-2 • On aimeraient rassembler des noms • Aide pour rassembler des informations pour le rapport final • Post mortem Juillet 9 - 10
Nous vous offrons • Un twiki avec de l’information détaillée • La réunion Atlas (aussi par tel.) de 09:00 hr. • La réunion WLCG (aussi par tel.) de 15:00 hr • La réunion d’opération (aussi par tel.) les Jeudis a 15:30 hr. • La réunion virtuelle par Skype (24/24) • Plusieurs listes de courrier électronique • Des adresses email privées • Des numéros de téléphone • Notre bonne volonté
Tier-1 simulation Tier-2 Data Handling and Computation for Physics Analysis Tier-0 reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch
Storage Area’s @CERN T1 T1 T1 T1 afs Tape User MC data Detector data t0atlas t0merge default Re-processing data Scratch atlprod AOD AOD end-user analysis Detector data DPD DPD Group atldata CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs physics group analysis Calibration data CAF T0 CPU CPU Xpress Stream atlcal calibration and alignment users space managers space