1 / 38

ATLAS Distributed Computing

ATLAS Distributed Computing. Kors Bos Annecy, le 18 Mai 2009. ATLAS Workflows. Calibration & Alignment Express Stream Analysis. Prompt Reconstruction. Tier-0. CAF. CASTOR. RAW Re-processing HITS Reconstruction. Tier-1. Tier-1. Tier-1. Tier-2. Tier-2. Tier-2. Simulation Analysis.

rea
Download Presentation

ATLAS Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Distributed Computing Kors Bos Annecy, le 18 Mai 2009

  2. ATLAS Workflows Calibration & Alignment Express Stream Analysis Prompt Reconstruction Tier-0 CAF CASTOR RAW Re-processing HITS Reconstruction Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Simulation Analysis Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

  3. At the Tier-0 RAW, Data from the detector 1.6 MB/ev ESD, Event Summary Data 1.0 MB/ev AOD, Analysis Object Data 0.2 MB/ev DPD, Derived Physics Data 0.2 MB/ev TAG, Data tag 0.01 MB/ev

  4. Reality is more complicated

  5. From the detector Data Streams Runs and RAW Merging A start/stop is between 2 Luminosity Blocks ~30 seconds  file All files in a run  dataset 200 Hz for 30’ is 6000 events but split between ~10 streams Streams are unequal and some create too small files Small RAW files are merged into 2 GB files Only merged files are written to tape and exported Physics streams • egamma • muon • Jet • Etmiss • tau • Bphys • minBias Calibration streams • Inner Detector Calibration Stream • Contains only partial events • Muon Calibration Stream • Contains only partial events • Analyzed outside CERN • Express line • Full events, 10% of data

  6. Calibration and Alignment Facility CAF • Per run .. • Express line used for real-time processing • Initial calibration used • verified by DQ shifters • Calibration data processed in CAF • Initial calibrations used • New calibrations into offline db • Express line processed again • New calibrations used • Verified by DQ shifters • If necessary fixes applied • Express line processed again if necessary • Buffer for several days of data • Reconstruction of all data triggered • Results archived on tape. and • Made available at CERN, and • Replicated to other clouds

  7. ATLAS Clouds

  8. French Tier-2

  9. Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)

  10. A T L A S STEP09 A Functional and Performance Test For all 4 experiments simultaneously

  11. What we would like to test • Full computing model • Tape writing and reading simultaneously in Tier-1’s and Tier-0 • Processing priorities and shares in Tier-1 and -2’s • Monitoring of all those activities • Simultaneously with other experiments (test shares) • All at nominal rates for 2 weeks: June 1 - 14 • Full shift schedule in place like for cosmics data taking • As little disruptive as possible for detector comissioning

  12. Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)

  13. Detector Data Distribution

  14. The Common Computing Readiness Challengeof last year 12h backlog Fully Recovered in 30 minutes All Experiments in the game MB/s Subscriptions injected every 4 hours and immediately honored T0->T1s throughput MB/s

  15. Tier-0  Tier-1 rates and volumes

  16. Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)

  17. 2. Detector data re-processing (in the Tier-1’s) • Each Tier-1 responsible to re-process its share • Pre-stage RAW data back from tape to disk • Re-process reconstruction (on average 30’ per event) • Output ESD, AOD, DPD archived to tape • Copy AOD and DPD to all other 9 Tier-1’s • Distribute AOD and DPD over Tier-2’s of ‘this’ cloud • Copy ESD to 1 other (sister) Tier-1

  18. Re-processing work flow Here means mAODmDPD merged AOD/DPD files

  19. Spring09 re-Processing Campaign • Total input data (RAW) • 138 runs, 852 containers, 334,191 files, 520 TB • https://twiki.cern.ch/twiki/pub/Atlas/ DataPreparationReprocessing/reproMarch09_inputnew.txt • Total output data (ESD, AOD, DPD, TAG, NTUP, etc.) • 12,339 containers, 1,847,149 files, 133 TB • Compare with last time - 116.8TB - due to extra runs, DPD formats etc

  20. Simplified re-processing for STEP09 • Spring09 campaign too complicated • Simplify by just running RAWESD • Using Jumbo tasks • RAW staged from tape • ESD archived back onto tape • Volume is smaller than with real data • Increase Data Distribution FT • To match the missing AOD/DPD traffic

  21. Re-Processing targets • Re-processing at 5x the rate of nominal data taking • Be aware: ESD is much smaller for cosmics than for data • ESD file size 140 MB i.s.o. 1 GB

  22. Tier-1  Tier-1 Volumes and Rates • Re-processed data distributed like original data from Tier-0 • ESD to 1 partner Tier-1 • AOD and DPD to all other 9 Tier-1’s (and CERN) • and further to the Tier-2’s • AOD and DPD load simulated through DDM FT

  23. Tier-1  Tier-2 Volume and rates • Computing Model foresaw 1 copy of AOD+DPD per cloud • Tier-2 sites very hugely in size and many clouds export more than 1 copy

  24. Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)

  25. G4 Monte Carlo Simulation Production G4 simulation takes ~1000 s/event digi+reco takes ~20-40 s/event EVNT = 0.02 MB/event HITS = 2.0 MB/event RDO = 2.0 MB/event ESD = 1.0 MB/event AOD = 0.2 MB/event TAG = 0.01 MB/event

  26. MC Simulation Production statistics • Only limited by requests and disk space

  27. G4 Simulation Volumes • Mc09 should have started during STEP09 • Exclusively run on Tier-2 resources • Rate will be lower because other activities • Small HITS files produced in Tier-2’s uploaded to Tier-1 • Merged into Jumbo HITS and written to tape in Tier-1 • Merged MC08 data from tape will be used for reconstruction • AOD (and some ESD) written back to tape and distributed like data

  28. Activity areas • Detector data distribution • Detector data re-processing (in the Tier-1’s) • MC Simulation production (in the Tier-2’s) • User analysis (in the Tier-2’s)

  29. 5. User analysis • Mainly done in Tier-2’s • 50% capacity should be reserved for user analysis • We already see 30% activity at least in some site • In addition some Tier-1 sites have analysis facilities • Must make sure does not disrupt scheduled Tier-1 activities • We will also use HammerCloud analysis test framework • Contains 4 different AOD analyses • Can generate constant flow of jobs • Uses both WMS and PanDA back-ends in EGEE • Tier-2’s should install the following shares:

  30. Putting it all togetherTier-1 Volumes and Rates for STEP09 • For CCIN2P3: • ~10 TB for MCDISK and ~200 TB for DATADISK and ~55 TB on tape • ~166 MB/s data in and 265 MB/s data out (!?)

  31. Tape Usage For CCIN2P3 • Reading: 143 MB/s • RAW for re-processing • Jumbo HITS for reconstruction • Writing: 44 MB/s • RAW from the Tier-0 • (Merged) HITS from the Tier-2’s • output from re-processing (ESD, AOD, DPD, ..) • output from reconstruction (RDO, AOD, ..)

  32. Nous vous demandons .. • Que les sites vérifient les chiffres et les dates • nous savons que CCIN2P3 ne peut pas faire pre-staging automatique • C’est prévu d’avoir suffisamment de refroidissement début Juin? • Combien est la capacité des Tier-2’s dans le nuage Français? • Qu’ils y a une personne/des personnes qui observent • Sur chargement de systèmes, ralentissements, erreurs, .. • Saturations des bandes de passages • Au moins 1 personne par site, Tier-1 et Tier-2 • On aimeraient rassembler des noms • Aide pour rassembler des informations pour le rapport final • Post mortem Juillet 9 - 10

  33. Nous vous offrons • Un twiki avec de l’information détaillée • La réunion Atlas (aussi par tel.) de 09:00 hr. • La réunion WLCG (aussi par tel.) de 15:00 hr • La réunion d’opération (aussi par tel.) les Jeudis a 15:30 hr. • La réunion virtuelle par Skype (24/24) • Plusieurs listes de courrier électronique • Des adresses email privées • Des numéros de téléphone • Notre bonne volonté 

  34. La Fin

  35. Tier-1 simulation Tier-2 Data Handling and Computation for Physics Analysis Tier-0 reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch

  36. Storage Area’s @CERN T1 T1 T1 T1 afs Tape User MC data Detector data t0atlas t0merge default Re-processing data Scratch atlprod AOD AOD end-user analysis Detector data DPD DPD Group atldata CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs CPUs physics group analysis Calibration data CAF T0 CPU CPU Xpress Stream atlcal calibration and alignment users space managers space

  37. T1 Space Token Summary

  38. T2 Space Token Summary

More Related