350 likes | 451 Views
ATLAS Computing Model – US Research Program Manpower. J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004. Overview. Updates to the Computing Model The Tier hierarchy The base numbers Size estimates: T0, CAF, T1, T2 US ATLAS Research Program Manpower.
E N D
ATLAS Computing Model –US Research Program Manpower J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004
Overview • Updates to the Computing Model • The Tier hierarchy • The base numbers • Size estimates: T0, CAF, T1, T2 • US ATLAS Research Program Manpower NA ATLAS Physics Workshop Tucson, AZ
All Computing Model slides are from Roger Jones at last sw week http://agenda.cern.ch/age?a036309 Computing Model • http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/computing-model/Comp-Model-December15.doc • Computing Model presented at the October Overview Week • Revision concerning the Tier-2s since then • Revision concerning effect of pile-up, luminosity profile • There are (and will remain) many unknowns • We are starting to see serious consideration of calibration and alignment needs in the sub-detector communities, but there is a way to go! • Physics data access patterns MAY start to be seen from the final stage of DC2 • Too late for the document • Unlikely to know the real patterns until 2007/2008! • Still uncertainties on the event sizes • RAW without pile-up is just over 1.6MB limit • ESD is (with only one tracking package) about 60% larger than nominal, 140% larger with pile-up • AOD is smaller than expected, but functionality will grow • With the advertised assumptions, we are at the limit of available disk • Model must maintain as much flexibility as possible • For review, we must present a single coherent model NA ATLAS Physics Workshop Tucson, AZ
Resource estimates • These have been revised again • Luminosity profile 2007-2010 assumed • More simulation (20% of data rate) • Now only ~30 Tier-2s • We can count about 29 candidates • This means that the average Tier-2 has grown because of simulation and because it represents a larger fraction • The needs of calibration from October have been used to update the CERN Analysis Facility resources • Input buffer added to Tier-0 NA ATLAS Physics Workshop Tucson, AZ
MSS MSS MSS Other Tier2 ~200kSI2k The System PC (2004) = ~1 kSpecInt2k ~Pb/sec Event Builder 10 GB/sec Event Filter~7.5MSI2k • Some data for calibration and monitoring to institutes • Calibrations flow back 320 MB/sec • ~5 Pb/year • No simulation T0 ~5MSI2k Tier 0 Castor ~ 75MB/s/T1 for ATLAS Tier 1 • ~2MSI2k/T1 • ~2 Pb/year/T1 US Regional Centre (BNL) UK Regional Center (RAL) Dutch Regional Center French Regional Center MSS 10 Tier-1s: rereconstruction store simulated data group Analysis 622Mb/s links Tier 2 Tier2 Center ~200kSI2k Tier2 Center ~200kSI2k Tier2 Center ~200kSI2k • ~200 Tb/year/T2 622Mb/s links Each Tier 2 has ~20 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Tier 3 ~0.25TIPS Tier3 Tier3 Tier3 Physics data cache 100 - 1000 MB/s links Desktops Workstations NA ATLAS Physics Workshop Tucson, AZ
Computing Resources • Assumption: • 200 days running in 2008 and 2009 at 50% efficiency (107 sec live) • 100 days running in 2007 (5x106 sec live) • Events recorded are rate limited in all cases – luminosity only affects data size and data processing time • Luminosity: • 0.5*1033 cm-2s-1 in 2007 • 2*1033 cm-2s-1 in 2008 and 2009 • 1034 cm-2s-1 (design luminosity) from 2010 onwards • Hierarchy • Tier-0 has raw+calibration data+first-pass ESD • CERN Analysis Facility has AOD, ESD and RAW samples • Tier-1s hold RAW data and derived samples and ‘shadow’ the ESD for another Tier-1 • Tier-1s also house simulated data • Tier-1s provide reprocessing for their RAW and scheduled access to full ESD samples • Tier-2s provide access to AOD and group Derived Physics Datasets and carry the full simulation load NA ATLAS Physics Workshop Tucson, AZ
Processing • Tier-0: • First pass processing on express/calibration lines • 24-48 hours later, process full primary stream with reasonable calibrations • Tier-1: • Reprocess 1-2 months after arrival with better calibrations (steady state: and same software version, to produce a coherent dataset) • Reprocess all resident RAW at year end with improved calibration and software NA ATLAS Physics Workshop Tucson, AZ
The Input Numbers • Nominal year 107 • Accelerator efficiency 50%
Resource Summary (15 Dec. version) Table 1: The estimated resources required for one full year of data taking in 2008 or 2009. NA ATLAS Physics Workshop Tucson, AZ
Amount of Simulation is a “free” parameter 20% of data 100% of data NA ATLAS Physics Workshop Tucson, AZ
2008 T0 requirements Understanding of the calibration load is evolving. NA ATLAS Physics Workshop Tucson, AZ
T0 Evolution – Total capacity Note detailed evolutions differ from the draft – revised and one bug fixed NA ATLAS Physics Workshop Tucson, AZ
T0 Cost/Year Evolution NA ATLAS Physics Workshop Tucson, AZ
CERN Analysis Facility • Small-sample chaotic reprocessing 170kSI2k • Calibration 530kSI2k • User analysis ~1470kSI2k – much increased • This site does not share in the global simulation load • The start-up balance would be very different, but we should try to respect the envelope NA ATLAS Physics Workshop Tucson, AZ
Analysis Facility Evolution NA ATLAS Physics Workshop Tucson, AZ
Analysis Facility Cost/Year Evolution NA ATLAS Physics Workshop Tucson, AZ
2008 Average T1 Requirements Typical Tier-1 Year 1 resources This includes a ‘1year, 1 pass’ buffer ESD is 47% of Disk ESD is 33% of Tape Current pledges are ~55% of this requirement Making event sizes bigger makes things worse! Estimate about 1800kSi2k for each of 10 T1s Central analysis (by groups, not users) ~1300kSI2k NA ATLAS Physics Workshop Tucson, AZ
Single T1 Evolution NA ATLAS Physics Workshop Tucson, AZ
Single T1 Cost/Year Evolution NA ATLAS Physics Workshop Tucson, AZ
20 User Tier-2 2008 Data Only • User activity includes some reconstruction (algorithm development etc) • Also includes user simulation (increased) • T2s also share the event simulation load (increased), but not the output data storage NA ATLAS Physics Workshop Tucson, AZ
20-user T2 Evolution NA ATLAS Physics Workshop Tucson, AZ
20-user T2 Cost Evolution NA ATLAS Physics Workshop Tucson, AZ
Overall 2008-only Resources(‘One Full Year’ Resources) If T2 supports private analysis, add about 1 TB and 1 kSI2k/user NA ATLAS Physics Workshop Tucson, AZ
Overall 2008 Total Resources If T2 supports private analysis, add about 1.5 TB and 1.5 kSI2k/user NA ATLAS Physics Workshop Tucson, AZ
Important points: • Discussion on disk vs tape storage at Tier-1’s • Tape in this discussion means low-access slow secure storage • Storage of Simulation • Assumed to be at T1s • Need partnerships to plan networking • Must have fail-over to other sites • Commissioning • These numbers are calculated for the steady-state but with the requirement of flexibility in the early stages • Simulation fraction is an important tunable parameter in T2 numbers! NA ATLAS Physics Workshop Tucson, AZ
Latencies • On the input side of the T0, assume following: • Primary stream – every physics event • Publications should be based on this, uniform processing • Calibration stream – calibration + copied selected physics triggers • Need to reduce latency of processing primary stream • Express stream – copied high-pT events for ‘excitement’ and (with calibration stream) for detector optimisation • Must be a small percentage of total • Express and calibration streams get priority in T0 • New calibrations determine the latency for primary processing • Intention is to have primary processing within 48 hours • Significantly more would require a prohibitively large input buffer • Level of access to RAW? • Depends on functionality of ESD • Discussion of small fraction of DRD – augmented RAW data • Software and processing model must support very flexible data formats NA ATLAS Physics Workshop Tucson, AZ
Networking • EFT0 maximum 320MB/s (450MB/s with headroom) • Networking off-site now being calculated with David Foster • Recent exercise with (almost) current numbers • Traffic from T0 to each Tier-1 is 75MB/s – will be more with overheads and contention (225MB/sec) • Significant traffic of ESD and AOD from reprocessing between T1s • 52MB/sec raw • ~150MB/sec with overheads and contention • Dedicated networking test beyond DC2, plans in HLT NA ATLAS Physics Workshop Tucson, AZ
Conclusions and Timetable • Computing Model documents required by 15th December • This is the last chance to alter things • We need to present a single coherent model • We need to maintain flexibility • Intend to produce a requirements and recommendations document • Computing Model review in January 2005 (P McBride) • We need to have serious inputs at this point • Documents to April RRBs • MoU Signatures in Summer 2005 • Computing & LCG TDR June 2005 NA ATLAS Physics Workshop Tucson, AZ
Calibration and Start-up Discussions • Richard will present some comments from others on what they would like at start-up • Some would like a large e/mu second copy on disk for repeated reprocessing • Be aware of the disk and CPU requirements • 10Hz + 2 ESD versions retained = >0.75PB on disk • Full sample would take 5MSI2k to reprocess in a week • Requires scheduled activity or huge resources • If there are many reprocessings you must either distribute it or work with smaller samples • What were (are) we planning to provide? • @CERN • 1.1MSI2k in T0 and Super T2 for calibration etc • T2 also has 0.8MSI2k for user analysis • Super T2 with 0.75TB disk, mainly AoD but could be more Raw+ESD to start • In T1 Cloud • T1 cloud has 10% of Raw on disk and 0.5MSI2k in T1 cloud for calibration • In T2s • 0.5PB for RAW+ESD, should allow small unscheduled activities NA ATLAS Physics Workshop Tucson, AZ
Reality check Snapshot of Tier-1 status • Putting 10Hz e/mu on disk would require more than double the CERN disk • We are already short of disk in the T1s (funding source the same!) • There is capacity in the T1s so long as the sets are replaced with the steady-state sets as the year progresses NA ATLAS Physics Workshop Tucson, AZ
End of Computing Model talk. NA ATLAS Physics Workshop Tucson, AZ
U.S. Research Program Manpower NA ATLAS Physics Workshop Tucson, AZ
FY05 Software Manpower • 5.75 FTE @ LBNL • Core/Framework • 5 FTE @ ANL • Data Management • Event Store • 5 FTE @BNL • DB/Distributed Analysis/sw Infrastructure • 1 FTE @ U Pittsburgh • Detector Description • 1 FTE @ Indiana U. • Improving end-user usability of Athena NA ATLAS Physics Workshop Tucson, AZ
Est. FY06 Software Manpower • Maintain FY05 • Move from PPDG to Program funds • Puts about 1 FTE burden on program • + approx. 2 FTE at Universities • In long term, total expected program funded at universities is about 7 FTE NA ATLAS Physics Workshop Tucson, AZ
FY07 and beyond sw manpower • Reaching plateau • Maybe 1-2 more FTE at universities • Obviously, manpower for physics analysis (students, post-docs) is going to have to come from the base program. • We (project management) try to help get DOE/NSF base funding for all, but…prospects have not been good • “redirection” from Tevatron starting to happen, but it might not be enough for our needs in 2007 NA ATLAS Physics Workshop Tucson, AZ