Outline Computing model Activities in 2004 Conclusions

Atlas ComputingAlessandro De Salvo <Alessandro.DeSalvo@roma1.infn.it>Terzo workshop sul calcolo dell’INFN 5-2004 Outline Computing model Activities in 2004 Conclusions A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004

Atlas Data Rates per year Nominal year: 107 s Accelerator efficiency: 50%

Processing times • Reconstruction • Time/event for Reconstruction now: 60 kSI2k sec • We could recover a factor 4: • factor 2 from running only one default algorithm • factor 2 from optimization • Foreseen reference: 15 kSI2k sec/event • Simulation • Time/event for Simulation now: 400 kSI2k sec • We could recover a factor 4: • factor 2 from optimization (work already in progress) • factor 2 on average from the mixture of different physics processes (and rapidity ranges) • Foreseen reference: 100 kSI2k sec/event • Number of simulated events needed: 108 events/year • Generate samples about 3-6 times the size of their streamed AOD samples

Production/analysis model • Central analysis • Central production of tuples and TAG collections from ESD • Estimate data reduction to 10% of full AOD • About 720Gb/group/annum • 0.5kSI2k per event (estimate), quasi real time  9MSI2k • User analysis • Tuples/streams analysis • New selections • Each user will perform 1/N of the MC non-central simulation load • analysis of WG samples and AOD • private simulations • Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape • Assume this is all done on T2s • DC2 will provide very useful informations in this domain

Computing centers in Atlas • Tiers defined by capacity and level of service • Tier-0 (CERN) • Hold a copy of all raw data to tape • Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing) • Keep calibration data on disk • Run first-pass calibration/alignment and reconstruction • Distribute ESD’s to external Tier-1’s • (1/3 to each one of 6 Tier-1’s) • Tier-1’s (at least 6): • Regional centers • Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s • Keep on tape 1/6 of Raw Data • Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions • Provide facilities for physics group controlled ESD analysis • Calibration and/or reprocessing of real data (one per year) • Tier-2’s (about 4 per Tier-1) • Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s • Keep on disk a small selected sample of ESD’s • Provide facilities (CPU and disk space) for user analysis and user simulation (~25 users/Tier-2) • Run central simulation

Tier-1 Requirements R. Jones – Atlas Software Workshop may 2005 Processing for Physics Groups 1760 kSI2k Reconstruction 588 kSI2k

Tier-2 Requirements R. Jones – Atlas Software Workshop may 2005 Simulation 21 kSI2k Reconstruction 2 kSI2k Users 176 kSI2k Total: 199 kSI2k

Tier 0/1/2 sizes • Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones) • Scheduled CPU activity, 85% efficiency • Chaotic CPU activity, 60% • Disk usage, 70% efficient • Tape assumed 100% efficient R. Jones – Atlas Software Workshop may 2005

HPSS HPSS HPSS Atlas Computing System PC (2004) = ~1 kSpecInt2k ~Pb/sec Event Builder 10 GB/sec Event Filter~159kSI2k • Some data for calibration and monitoring to institutess • Calibrations flow back 450 Mb/sec • ~9 Pb/year/T1 • No simulation Tier 0 T0 ~5MSI2k HPSS ~ 300MB/s/T1 /expt Tier 1 • ~7.7MSI2k/T1 • ~2 Pb/year/T1 UK Regional Centre (RAL) US Regional Centre Italian Regional Centre French Regional Centre HPSS R. Jones – Atlas Software Workshop may 2005 622Mb/s Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k Tier 2 • ~200 Tb/year/T2 622Mb/s Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation NA RM1 LNF MI Physics data cache 100 - 1000 MB/s Desktop Workstations

Atlas computing in 2004 • “Collaboration” activities • Data Challenge 2 • May-August 2004 • Real test of computing model for computing TDR (end 2004) • Simulation, reconstruction, analysis & calibration • Combined test-beam activities • Combined test-beam operation concurrent with DC2and using the same tools • “Local” activities • Single muon simulation (Rome1, Naples) • Tau studies (Milan) • Higgs production (LNF) • Other ad-hoc productions

Goals in 2004 • DC2/test-beam • Computing model studies • Pile-up digitization in Athena • Deployment of the complete Event Data Model and the Detector Description • Simulation of full Atlas and 2004 Combined Testbeam • Test of the calibration and alignment procedures • Full use of Geant4, POOL and other LCG applications • Use widely the GRID middleware and tools • Large scale physics analysis • Run as much as possible the production on GRID • Test the integration of multiple GRIDs • “Local” activities • Run local, ad-hoc productions using the LCG tools

DC2 timescale • September 03: Release7 • Mid-November 03: pre-production release • March 17th 04: Release 8 (production) • May 17th 04: • June 23rd 04: • July 15th 04: • August 1st • Put in place, understand & validate: • Geant4; POOL; LCG applications • Event Data Model • Digitization; pile-up; byte-stream • Conversion of DC1 data to POOL; large scale persistency tests and reconstruction • Testing and validation • Run test-production • Testing and validation • Continuous testing of s/w components • Improvements on Distribution/Validation Kit • Start final validation • Intensive test of “Production System” • Event generation ready • Simulation ready • Data preparation • Data transfer • Reconstruction ready • Tier 0 exercise • Physics and Computing model studies • Analysis (distributed) • Reprocessing • Alignment & calibration Slide from Gilbert Poulard

DC2 resources

Tiers in DC2 More than 23 countries involved

DC2 tools • Installation tools • Atlas software distribution kit • Validation suite • Production system • Atlas production system interfaced to LCG, US-Grid, NorduGrid and legacy systems (batch systems) • Tools • Production management • Data management • Cataloguing • Bookkeping • Job submission • GRID distributed analysis • ARDA domain: test services and implementations

Software installation • Software installation and configuration via PACMAN • Full use of the Atlas Code Management Tool (CMT) • Relocatable, multi-release distribution • No root privileges needed to install • GRID-enabled installation • Grid installation via submission of a job to the destination sites • Software validation tools, integrated with the GRID installation procedure • A site is marked as validated after the installed software is checked with the validation tools • Distribution format • Pacman packages (tarballs) • Kit creation • Building scripts (Deployment package) • Built in about 3 hours, after the release is built • Kit requirements • RedHat 7.3 • >= 512 MB of RAM • Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release • Kit installation • pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease • Documentation (building, installing and using) • http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution

Atlas Production System components • Production database • Oracle based • Hold definition for the job transformations • Hold sensible data on the jobs life cycle • Supervisor (Windmill) • Consumes jobs from the production database • Dispatch the work to the executors • Collect info on the job life-cycle • Interact with the DMS for data registration and movements among the systems • Executor • One for each grid falvour and legacy system • LCG (Lexor) • NorduGrid (Dulcinea) • US Grid (Capone) • LSF • Communicates with the supervisor • Executes the jobs to the specific subsystems • Flavour-neutral job definitions are specialized for the specific needs • Submit to the GRID/legacy system • Provide access to GRID flavour specific tools • Data Management System (Don Quijote) • Global cataloguing system • Allows global data management • Common interface on top of the system-specific facilities

Atlas Production System architecture Task = [job]* Dataset = [partition]* JOB DESCRIPTION Location Hint (Task) Task (Dataset) Task Transf. Definition Data Management System + physics signature Human intervention Job Run Info Location Hint (Job) Job (Partition) Partition Transf. Definition Transformation infos Release version signature Supervisor 1 Supervisor 2 Supervisor 3 Supervisor 4 Jabber Jabber Jabber Jabber US Grid Executer NG Executer LSF Executer LCG Executer Chimera RB RB US Grid LCG NG Local Batch

DC2 status • DC2 first phase started May 3rd • Test the production system • Start the event generation/simulation tests • Full production should start next week • Full use of the 3 GRIDs and legacy systems • DC2 jobs will be monitored via GridICEand an ad-hoc monitoring system, interfaced to the production DB and the production systems

Atlas Computing & INFN (1) • Responsibles & managers • D. Barberis • Genova, inizialmente membro del Computing Steering Group come responsabile del software dell.Inner Detector, ora ATLAS Computing Coordinator • G. Cataldi • Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore • S. Falciano • Roma1, responsabile TDAQ/LVL2 • A. Farilla • Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction Coordinator e coordinatore del software per il Combined Test Beam • L. Luminari • Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia • A. Nisati • Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board • L. Perini • Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE • G. Polesello • Pavia, Atlas Physics Coordinator • A. Rimoldi • Pavia, ATLAS Simulation Coordinator e membro del Software Project Management Board • V. Vercesi • Pavia, PESA Coordinator e membro del Computing Managament Board

Atlas Computing & INFN (2) • Atlas INFN sites LCG compliant for DC2 • Tier-1 • CNAF (G.Negri) • Tier-2 • Frascati (M. Ferrer) • Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa) • Naples (G. Carlino, A. Doria, L. Merola) • Rome1 (A. De Salvo, A. Di Mattia, L. Luminari) • Activities • Development of the LCG interface to the Atlas Production Tool • F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa • Participation to the DC2 using the GRID middleware (May - July 2004) • Local productions with GRID tools • Atlas VO management (A. De Salvo) • Atlas code distribution (A. De Salvo) • Atlas code distribution model (PACMAN based) fully deployed • The current installation system/procedure gives the possibility to have easily the cohexistence of the Atlas software and other experiments’ environment • Atlas distribution kit validation (A. De Salvo) • Transformations for DC2 (A. De Salvo)

Conclusions • First real test of the Atlas computing model is starting • DC2 tests started at the beginning of May • “Real” production starting in June • Will give important informations for the Computing TDR • Very intensive use of the GRIDs • Atlas Production System interfacted to LCG, NG and US Grid (GRID3) • Global data management system • Getting closer to the real experiment computing model

Outline Computing model Activities in 2004 Conclusions

Outline Computing model Activities in 2004 Conclusions

Presentation Transcript

Outline of model

Model-Eliciting Activities:

Workplan Implementation Agreed Activities and Conclusions

OUTLINE Status Preliminary conclusions of the CNGS Review Board (30 June 2004) -- Discussion --

ALICE Computing Model

DEF activities 2004

DO Computing Model

ALICE Computing Activities in Korea

NTC ACTIVITIES 2005 Outline

LHCb Computing Model

Grid Computing Outline

Outline of model

Outline of Activities

Model-Eliciting Activities

Public Awareness Programme, assumptions, activities, conclusions

BSD Computing Activities

Outline of Activities