320 likes | 416 Views
ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1). ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS. Overall plan. T0 processing and data distribution T1 data re-processing T2 Simulation Production T0/1/2 Physics Group Analysis
E N D
ATLASCommon Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1) ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
Schedule: May Page 3
Schedule: June So, we have to adapt to detector data taking. • Mon-Wednesday: Functional tests (data generator) • Thursday: analysis & changeover • Fri-Sunday: Detector data Page 4
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T01/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
-1- T0 processing and data distribution • Monday – Thursday: Data Generator • Simulate running of 10 hours@200Hz per day • nominal is 14 hours • Run continuously at 40% • Distribution of data to T1’s and T2’s • Request T1 storage classes ATLASDATADISK and ATLASDATATAPE for disk and tape • Request T2 storage class ATLASDATADISK • Request T1 storage space for full 4 weeks • Tests of • Data sitribution • Distribution latency • T0-T1, T1-T2, T1-T1 • Thursday – Sunday Detector Data • Possibly uninterrupted data taking during weekend • Distribution of data to T1’s and T2’s • Request T1 storage classes ATLASDATADISK and ATLASDATATAPE for disk and tape • Request T2 storage class ATLASDATADISK • Request T1 storage space for full 4 weeks • Tests of: • Merging of small files • Real T0 processing • Data also to atldata at CERN • Special requests
ATLAS Data @ T0reminder of the Comp.Model • Raw data arrives on disk and is archived to tape • initial processing provides ESD, AOD and NTUP • a fraction (10%) of RAW and ESD and all AOD is made available on disk at CERN in the atldata pool • RAW data is distributed by ratio over the T1’s to go to tape • AOD is copied to each T1 to remain on disk and/or copied to T2’s • ESD follows the RAW to the T1 to remain on disk • a second ESD copy is send to the paired T1 • we may change this distribution for early running
t0atlas Tier-0 Data flowforATLAS DATA ESD ESD Raw AOD AOD Tier-1 Group Analysis TAPE ATLASDATATAPE ATLASDATADISK AOD ATLASGRP Tier-2 End User Analysis Group Analysis AOD ATLASDATADISK AOD ATLASGRP Tier-3 ATLASENDUSER
Data sample per dayfor when we run the generator for the1 day 10 hrs@200Hz 7.2 Mevents/day In the T0: • 20TB/day RAW+ESD+AOD to tape • 1.2 TB/day RAW to disk (10%) • 0.7TB/day ESD to disk (10%) • 1.4 TB/day AOD to disk 5day t0atlas buffer must be100TByte RAW=1.6MB ESD=1MB AOD=0.2MB
Tape & Disk Space Requirementsfor when we run the generator for the 4 weeks of CCRC 10 hrs@200Hz 7.2 Mevents/day CCRC is 4 weeks of 28 days In the T0: • 565TB RAW+ESD+AOD to tape • 32 TB RAW to disk (10%) • 20 TB ESD to disk (10%) • 40TB AOD to disk atldata disk must be92 TB for the month RAW=1.6MB ESD=1MB AOD=0.2MB
ATLAS Data @ T1 • T1’s are for data archive and re-processing • And for group analysis on ESD and AOD data • The Raw data share goes to tape @T1 • A fraction (10%) of that also goes to disk • Each T1 receives a copy of all AOD files • Each T1 receives a share of the ESD files • ESD data sets follow the RAW data • An extra copy of that also goes to “sister”T1 • BNL takes a full copy • In total 2.5 copies of all ESD files world-wide
ATLAS Data @ T2 • T2’s are for Monte Carlo Simulation Production • ATLASs assume there is no tape storage available • Also used for Group analysis • Each physics group has its own space token ATLASGRP<name> • F.e. ATLASGRPHIGGS, ATLASGRPSUSY, ATLASGRPMINBIAS • Some initial volume for testing: 2 TB • T2’s may request AOD datasets • Defined by the primary interest of the physics community • Another full copy of all AOD’s should be available in the cloud • Also for End-User Analysis • Accounted as T3 activity, not under ATLAS control • Storage space not accounted as ATLAS • But almost all T2 (and even T1’s ) need space for token ATLASENDUSER • Some initial value for testing: 2 TB
Nota Bene • We had many ATLASGRP<group> storage area’s but (almost) none were used • It seems, at this stage, that one for each VOMS group is over the top • Much overhead to create too many small storage classes • For now, a catch-all, ATLASGRP • We may revert back later when we better see the usage
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
-2- T1 re-processing • Not at full scale yet, but at all T1’s at least. • Subset of M5 data staged back from tape per dataset • 10 datasets of 250 files each plus 1 dataset of 5000 files • Each file is ~2 GB total data volume is ~5 TB, the big one is 10 TB • Conditions data on disk (140 files) • Each re-proc. job opens ~35 of those files • M5 data file copied to local disk of WN • Output ESD and AOD file • Kept on disk and archived on tape (T1D1 storage class) • Copied to one or two other T1’s for ESD files • Copied to all other T1’s for AOD files
Resourcerequirements for M5 re-proc • M5 RAW datawill be distributed over T1’s • One dataset with 5000 files of 2 GB each • Pre-staging of 10 TByte (50 cassettes) • Each job (1 file) takes ~30 minutes • We request 50 cpu’s to be through in 2 days • Only (small) ESD output from re-processing • So minimal requirements for T1D1 pool • Tape cash of 5 TB will require us to think • So 5 TB requirement for T1D0 pool
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T01/2/ Physics Group Analysis • T0/1/2/3 End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
-3-T2 Simulation Production • Simulation of physics and background for FDR-2 • Need to produce ~30M events • Simulation HITS (4 MB/ev), Digitization RDO (2 MB/ev) • Reconstruction ESD (1.1 MB/ev), AOD (0.2 MB/ev) • Simulation is done at the T2 • HITS uploaded to T1 and kept on disk • In T1: digitization RDOs sent to BNL for mixing • In T1: Reconstruction ESD, AOD • ESD, AOD archived to tape at T1 • ESD copied to one or two other T1’s • AOD copied to each other T1
Tape & Disk Space Requirementsfor the FDR-2 production HITS=4 MB RDO=2 MB ESD=1.1 MB AOD=0.2 MB 0.5 Mevents/day FDR2 production 8 weeks 30M events In total: • 120 TB HITS • 60 TB RDO BNL • 33 TB ESD • 6 TB AOD
Tier-2 ATLASMCDISKTAPE ATLSMCDISK ATLASMCTAPE RDO Simulation HITS HITS RDO Tier-1 TAPE Pile-up HITS HITS RDO ATLASMCDISK Reconstruction RDO ESD TAPE ESD AOD AOD RDO Tier-0 ESD ESD AOD Mixing BS AOD BNL RDO OtherTier-1 Data flowforSimulationProduction ATLASMCDISK
Storage Types@T2for simulation production Additional storage types @T1for simulation production
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3/ End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
-4- T0/1/2 Physics Group Analysis • done at T0 & T1 & T2 … not at T3’s • production of primary Derived Physics Data files (DPD’s) • DPD’s are 10% of AOD’s in size … but there are 10 X more • primary DPD’s are produced form ESD and AOD at the T1’s • secondary DPD’s are produced at T1 and T2’s • also other file types may be produced (ntup’s, hist’s) • jobs always ran by managers, data always ran to/from disk • writable for group managers only, readable by all ATLAS
Overall plan • T0 processing and data distribution • T1 data re-processing • T2 Simulation Production • T0/1/2 Physics Group Analysis • T0/1/2/3/ End-User Analysis • Synchronously with Data from Detector Commissioning • Fully rely on srmv2 everywhere • Test now at real scale (no data deletions) • Test the full show: shifts, communication, etc.
-5- T0/1/2/3 End-User Analysis • done at T0 & T1 & T2 & T3’s • users can run (cpu) anywhere where there are ATLAS resources • but can only write where it has write permission (home inst.) • Each site can decide how to implement this (T1D0, T0D1) • Data must be registered in the catalog • non-registered data is really Tier-3 or laptop • longer discussion tomorrow in ATLAS Jamboree
Detailed Planning • Functional Tests using data generator • First week Monday through Thursday • T1-T1 tests for all sites (again) • Second week Tuesday through Sunday • Throughput Tests using data generator • Third week Monday through Thursday • Contingency • Fourth week Monday through Sunday • Detector Commissioning with Cosmic Rays • Each week Thursday through Sunday • Reprocessing M5 data • Each week Tuesday through Sunday • Clean-up • each Monday • Remove all test data • last weekend: May 31 & June 1 • Full Dress Rehearsal • June 2 through 10
Metrics and Milestones • still to be defined
References • CCRC and Space Tokens Twiki : https://twiki.cern.ch/twiki/bin/view/Atlas/SpaceTokens#CCRC08_2_Space_Token_and_Disk_Sp • ADC Ops. eLog (certificate protected): https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/