140 likes | 290 Views
ATLAS Computing Status and Plans. Richard P Mount SLAC National Accelerator Laboratory. Topics. Organization from March 1 Status – the last 12 months Plans – Preparation for Run 2 Vision of the Future. Software and Computing Organization – From March 1.
E N D
ATLAS ComputingStatus and Plans Richard P Mount SLAC National Accelerator Laboratory
Topics • Organization from March 1 • Status – the last 12 months • Plans – Preparation for Run 2 • Vision of the Future
Software and Computing Organization – From March 1 Software and Computing Coordination Richard Mount Eric Lancon (deputy) Software Rolf Seuster and Markus Elsing Distributed Computing (ADC) Simone Campana and Torre Wenaus • Major software effort underway preparing for Run 2: • Implementation Task Forces • Reorganization and rationalization of existing activities
CPU Usage March 2013 to January 2014 • Tier 1s: • Consistent above-pledge performance • Saturation most of the time MC Simulation User Analysis MC Reco Group Prod Group Analy • Tier 2s: • Consistent delivery of above-pledge and opportunistic resources • Saturation most of the time
High Level Trigger Farm Exploitation CERN T0 and CAF usage for grid jobs ATLAS HLT usage for grid jobs (bursts of over 15k jobs) • The HLT has about 10% of the total ATLAS CPU capacity • Its time-averaged availability for simulation is expected to be no more than 30%
Pending Jobs and Volume of Data Processed MC Simulation User Analysis MC Reco Group Prod Group Analy Must limit simulation to keep analysis turnround acceptable, always many pending requests, priority via physics coordination Analysis is the main driver of storage+network I/O capacity Total: > 1 Exabyte
Disk Space MC Production Primary (pinned) Default (pinned) Secondary (Dynamically Managed) Input Tier 1 Real Data Group Analysis User Analysis T1 and T2 disks are full, requiring regular deletion of less-recently-accessed data T1 dynamically managed space is currently too small (need to pin less data) MC Production Tier 2 Group Analysis Real Data User Analysis
Tier 1 Tape On track to saturate the 41 PB pledge Simulated Hits to be kept for ~1 year in future ESD no longer written in most cases Expect major growth of Group Data on tape. Raw Data Simulated Hits AOD ESD NTUP
Preparation for Run 2 • Guiding Principle: • Maximize physics capability while requiring resources that grow more slowly than LHC luminosity • Key disk and CPU efficiency improvements: • Improve reconstruction efficiency (target factor 2 to 3 in speed) • Improve full simulation efficiency • Implement the Integrated Simulation Framework supporting an optimal mix of full and fast simulation • Rationalize analysis workflow (less CPU/luminosity and less Disk/luminosity1for the same physics ) 1) Smaller data formats, fewer version of the largest datasets
Major improvements to analysis environment • Task forces • TF1, TF4: Define and implement new Event Data Model (xAOD); migrate to new vector/matrix library Eigen • TF2: Define and implement Reduction framework and train model • TF3: New Analysis framework and tools; generic tools for “Combined Performance” recommendations.
Preparation for Run 2 (continued) • Other key improvements: • Rucio – new scalable distributed data management system • ProdSys-2 (DEFT and JEDI) • More formalized and automated production management • Jobs automatically defined to meet the needs of computing resources • Ongoing developments • Federated Atlas Xrootd (and http) access • Potential optimization of disk use – test in DC14 • Radical reduction of pinned disk data • Test during 2014 • Stay clear of thrashing the tape system • “Event Server” technology to facilitate exploitation of opportunistic resources with unpredictable availability.
ATLAS Computing Longer Term Vision • Computing: • Major shifts in relative costs of CPU/Disk/Tape/Networks will continue • Need to be flexible in “store versus recompute” and “store locally versus get quickly or access directly from somewhere else” • Software: • Multi-threading (100s of threads) seems inevitable • Quality, intelligibility and supportability of software will be vital • Software for the Upgrade(s) • And Finally: • Beware of optimizing away our ability to discover the unexpected