1 / 32

ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1)

ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1). ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS. Overall plan. T0 processing and data distribution T1 data re-processing T2 Simulation Production T0/1/2 Physics Group Analysis

karis
Download Presentation

ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLASCommon Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1) ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS

  2. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  3. Schedule: May Page 3

  4. Schedule: June So, we have to adapt to detector data taking. • Mon-Wednesday: Functional tests (data generator) • Thursday: analysis & changeover • Fri-Sunday: Detector data Page 4

  5. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T01/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  6. -1- T0 processing and data distribution • Monday – Thursday: Data Generator • Simulate running of 10 hours@200Hz per day • nominal is 14 hours • Run continuously at 40% • Distribution of data to T1’s and T2’s • Request T1 storage classes ATLASDATADISK and ATLASDATATAPE for disk and tape • Request T2 storage class ATLASDATADISK • Request T1 storage space for full 4 weeks • Tests of • Data sitribution • Distribution latency • T0-T1, T1-T2, T1-T1 • Thursday – Sunday Detector Data • Possibly uninterrupted data taking during weekend • Distribution of data to T1’s and T2’s • Request T1 storage classes ATLASDATADISK and ATLASDATATAPE for disk and tape • Request T2 storage class ATLASDATADISK • Request T1 storage space for full 4 weeks • Tests of: • Merging of small files • Real T0 processing • Data also to atldata at CERN • Special requests

  7. ATLAS Data @ T0reminder of the Comp.Model • Raw data arrives on disk and is archived to tape • initial processing provides ESD, AOD and NTUP • a fraction (10%) of RAW and ESD and all AOD is made available on disk at CERN in the atldata pool • RAW data is distributed by ratio over the T1’s to go to tape • AOD is copied to each T1 to remain on disk and/or copied to T2’s • ESD follows the RAW to the T1 to remain on disk • a second ESD copy is send to the paired T1 • we may change this distribution for early running

  8. t0atlas Tier-0 Data flowforATLAS DATA ESD ESD Raw AOD AOD Tier-1 Group Analysis TAPE ATLASDATATAPE ATLASDATADISK AOD ATLASGRP Tier-2 End User Analysis Group Analysis AOD ATLASDATADISK AOD ATLASGRP Tier-3 ATLASENDUSER

  9. Data sample per dayfor when we run the generator for the1 day 10 hrs@200Hz 7.2 Mevents/day In the T0: • 20TB/day RAW+ESD+AOD to tape • 1.2 TB/day RAW to disk (10%) • 0.7TB/day ESD to disk (10%) • 1.4 TB/day AOD to disk 5day t0atlas buffer must be100TByte RAW=1.6MB ESD=1MB AOD=0.2MB

  10. Tape & Disk Space Requirementsfor when we run the generator for the 4 weeks of CCRC 10 hrs@200Hz 7.2 Mevents/day CCRC is 4 weeks of 28 days In the T0: • 565TB RAW+ESD+AOD to tape • 32 TB RAW to disk (10%) • 20 TB ESD to disk (10%) • 40TB AOD to disk atldata disk must be92 TB for the month RAW=1.6MB ESD=1MB AOD=0.2MB

  11. ATLAS Data @ T1 • T1’s are for data archive and re-processing • And for group analysis on ESD and AOD data • The Raw data share goes to tape @T1 • A fraction (10%) of that also goes to disk • Each T1 receives a copy of all AOD files • Each T1 receives a share of the ESD files • ESD data sets follow the RAW data • An extra copy of that also goes to “sister”T1 • BNL takes a full copy • In total 2.5 copies of all ESD files world-wide

  12. ATLAS Data @ T2 • T2’s are for Monte Carlo Simulation Production • ATLASs assume there is no tape storage available • Also used for Group analysis • Each physics group has its own space token ATLASGRP<name> • F.e. ATLASGRPHIGGS, ATLASGRPSUSY, ATLASGRPMINBIAS • Some initial volume for testing: 2 TB • T2’s may request AOD datasets • Defined by the primary interest of the physics community • Another full copy of all AOD’s should be available in the cloud • Also for End-User Analysis • Accounted as T3 activity, not under ATLAS control • Storage space not accounted as ATLAS • But almost all T2 (and even T1’s ) need space for token ATLASENDUSER • Some initial value for testing: 2 TB

  13. Nota Bene • We had many ATLASGRP<group> storage area’s but (almost) none were used • It seems, at this stage, that one for each VOMS group is over the top • Much overhead to create too many small storage classes • For now, a catch-all, ATLASGRP • We may revert back later when we better see the usage

  14. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  15. -2- T1 re-processing • Not at full scale yet, but at all T1’s at least. • Subset of M5 data staged back from tape per dataset • 10 datasets of 250 files each plus 1 dataset of 5000 files • Each file is ~2 GB  total data volume is ~5 TB, the big one is 10 TB • Conditions data on disk (140 files) • Each re-proc. job opens ~35 of those files • M5 data file copied to local disk of WN • Output ESD and AOD file • Kept on disk and archived on tape (T1D1 storage class) • Copied to one or two other T1’s for ESD files • Copied to all other T1’s for AOD files

  16. Resourcerequirements for M5 re-proc • M5 RAW datawill be distributed over T1’s • One dataset with 5000 files of 2 GB each • Pre-staging of 10 TByte (50 cassettes) • Each job (1 file) takes ~30 minutes • We request 50 cpu’s to be through in 2 days • Only (small) ESD output from re-processing • So minimal requirements for T1D1 pool • Tape cash of 5 TB will require us to think • So 5 TB requirement for T1D0 pool

  17. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T01/2/ Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  18. -3-T2 Simulation Production • Simulation of physics and background for FDR-2 • Need to produce ~30M events • Simulation HITS (4 MB/ev), Digitization RDO (2 MB/ev) • Reconstruction ESD (1.1 MB/ev), AOD (0.2 MB/ev) • Simulation is done at the T2 • HITS uploaded to T1 and kept on disk • In T1: digitization RDOs sent to BNL for mixing • In T1: Reconstruction  ESD, AOD • ESD, AOD archived to tape at T1 • ESD copied to one or two other T1’s • AOD copied to each other T1

  19. Tape & Disk Space Requirementsfor the FDR-2 production HITS=4 MB RDO=2 MB ESD=1.1 MB AOD=0.2 MB 0.5 Mevents/day FDR2 production 8 weeks 30M events In total: • 120 TB HITS • 60 TB RDO  BNL • 33 TB ESD • 6 TB AOD

  20. Tier-2 ATLASMCDISKTAPE ATLSMCDISK ATLASMCTAPE RDO Simulation HITS HITS RDO Tier-1 TAPE Pile-up HITS HITS RDO ATLASMCDISK Reconstruction RDO ESD TAPE ESD AOD AOD RDO Tier-0 ESD ESD AOD Mixing BS AOD BNL RDO OtherTier-1 Data flowforSimulationProduction ATLASMCDISK

  21. Storage Types@T2for simulation production Additional storage types @T1for simulation production

  22. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3/ End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  23. -4- T0/1/2 Physics Group Analysis • done at T0 & T1 & T2 … not at T3’s • production of primary Derived Physics Data files (DPD’s) • DPD’s are 10% of AOD’s in size … but there are 10 X more • primary DPD’s are produced form ESD and AOD at the T1’s • secondary DPD’s are produced at T1 and T2’s • also other file types may be produced (ntup’s, hist’s) • jobs always ran by managers, data always ran to/from disk • writable for group managers only, readable by all ATLAS

  24. Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3/ End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.

  25. -5- T0/1/2/3 End-User Analysis • done at T0 & T1 & T2 & T3’s • users can run (cpu) anywhere where there are ATLAS resources • but can only write where it has write permission (home inst.) • Each site can decide how to implement this (T1D0, T0D1) • Data must be registered in the catalog • non-registered data is really Tier-3 or laptop • longer discussion tomorrow in ATLAS Jamboree

  26. Summary Table for a 10% T1

  27. Summary Table for a typical T2

  28. Detailed Planning • Functional Tests using data generator • First week Monday through Thursday • T1-T1 tests for all sites (again) • Second week Tuesday through Sunday • Throughput Tests using data generator • Third week Monday through Thursday • Contingency • Fourth week Monday through Sunday • Detector Commissioning with Cosmic Rays • Each week Thursday through Sunday • Reprocessing M5 data • Each week Tuesday through Sunday • Clean-up • each Monday • Remove all test data • last weekend: May 31 & June 1 • Full Dress Rehearsal • June 2 through 10

  29. Detailed Planning

  30. Metrics and Milestones • still to be defined

  31. References • CCRC and Space Tokens Twiki : https://twiki.cern.ch/twiki/bin/view/Atlas/SpaceTokens#CCRC08_2_Space_Token_and_Disk_Sp • ADC Ops. eLog (certificate protected): https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/

More Related