220 likes | 308 Views
Atlas Computing Alessandro De Salvo < Alessandro.DeSalvo@roma1.infn.it > Terzo workshop sul calcolo dell’INFN 5-2004. Outline Computing model Activities in 2004 Conclusions. A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004. Atlas Data Rates per year. Nominal year: 10 7 s
E N D
Atlas ComputingAlessandro De Salvo <Alessandro.DeSalvo@roma1.infn.it>Terzo workshop sul calcolo dell’INFN 5-2004 Outline Computing model Activities in 2004 Conclusions A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Data Rates per year Nominal year: 107 s Accelerator efficiency: 50%
Processing times • Reconstruction • Time/event for Reconstruction now: 60 kSI2k sec • We could recover a factor 4: • factor 2 from running only one default algorithm • factor 2 from optimization • Foreseen reference: 15 kSI2k sec/event • Simulation • Time/event for Simulation now: 400 kSI2k sec • We could recover a factor 4: • factor 2 from optimization (work already in progress) • factor 2 on average from the mixture of different physics processes (and rapidity ranges) • Foreseen reference: 100 kSI2k sec/event • Number of simulated events needed: 108 events/year • Generate samples about 3-6 times the size of their streamed AOD samples
Production/analysis model • Central analysis • Central production of tuples and TAG collections from ESD • Estimate data reduction to 10% of full AOD • About 720Gb/group/annum • 0.5kSI2k per event (estimate), quasi real time 9MSI2k • User analysis • Tuples/streams analysis • New selections • Each user will perform 1/N of the MC non-central simulation load • analysis of WG samples and AOD • private simulations • Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape • Assume this is all done on T2s • DC2 will provide very useful informations in this domain
Computing centers in Atlas • Tiers defined by capacity and level of service • Tier-0 (CERN) • Hold a copy of all raw data to tape • Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing) • Keep calibration data on disk • Run first-pass calibration/alignment and reconstruction • Distribute ESD’s to external Tier-1’s • (1/3 to each one of 6 Tier-1’s) • Tier-1’s (at least 6): • Regional centers • Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s • Keep on tape 1/6 of Raw Data • Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions • Provide facilities for physics group controlled ESD analysis • Calibration and/or reprocessing of real data (one per year) • Tier-2’s (about 4 per Tier-1) • Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s • Keep on disk a small selected sample of ESD’s • Provide facilities (CPU and disk space) for user analysis and user simulation (~25 users/Tier-2) • Run central simulation
Tier-1 Requirements R. Jones – Atlas Software Workshop may 2005 Processing for Physics Groups 1760 kSI2k Reconstruction 588 kSI2k
Tier-2 Requirements R. Jones – Atlas Software Workshop may 2005 Simulation 21 kSI2k Reconstruction 2 kSI2k Users 176 kSI2k Total: 199 kSI2k
Tier 0/1/2 sizes • Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones) • Scheduled CPU activity, 85% efficiency • Chaotic CPU activity, 60% • Disk usage, 70% efficient • Tape assumed 100% efficient R. Jones – Atlas Software Workshop may 2005
HPSS HPSS HPSS Atlas Computing System PC (2004) = ~1 kSpecInt2k ~Pb/sec Event Builder 10 GB/sec Event Filter~159kSI2k • Some data for calibration and monitoring to institutess • Calibrations flow back 450 Mb/sec • ~9 Pb/year/T1 • No simulation Tier 0 T0 ~5MSI2k HPSS ~ 300MB/s/T1 /expt Tier 1 • ~7.7MSI2k/T1 • ~2 Pb/year/T1 UK Regional Centre (RAL) US Regional Centre Italian Regional Centre French Regional Centre HPSS R. Jones – Atlas Software Workshop may 2005 622Mb/s Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k Tier 2 • ~200 Tb/year/T2 622Mb/s Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation NA RM1 LNF MI Physics data cache 100 - 1000 MB/s Desktop Workstations
Atlas computing in 2004 • “Collaboration” activities • Data Challenge 2 • May-August 2004 • Real test of computing model for computing TDR (end 2004) • Simulation, reconstruction, analysis & calibration • Combined test-beam activities • Combined test-beam operation concurrent with DC2and using the same tools • “Local” activities • Single muon simulation (Rome1, Naples) • Tau studies (Milan) • Higgs production (LNF) • Other ad-hoc productions
Goals in 2004 • DC2/test-beam • Computing model studies • Pile-up digitization in Athena • Deployment of the complete Event Data Model and the Detector Description • Simulation of full Atlas and 2004 Combined Testbeam • Test of the calibration and alignment procedures • Full use of Geant4, POOL and other LCG applications • Use widely the GRID middleware and tools • Large scale physics analysis • Run as much as possible the production on GRID • Test the integration of multiple GRIDs • “Local” activities • Run local, ad-hoc productions using the LCG tools
DC2 timescale • September 03: Release7 • Mid-November 03: pre-production release • March 17th 04: Release 8 (production) • May 17th 04: • June 23rd 04: • July 15th 04: • August 1st • Put in place, understand & validate: • Geant4; POOL; LCG applications • Event Data Model • Digitization; pile-up; byte-stream • Conversion of DC1 data to POOL; large scale persistency tests and reconstruction • Testing and validation • Run test-production • Testing and validation • Continuous testing of s/w components • Improvements on Distribution/Validation Kit • Start final validation • Intensive test of “Production System” • Event generation ready • Simulation ready • Data preparation • Data transfer • Reconstruction ready • Tier 0 exercise • Physics and Computing model studies • Analysis (distributed) • Reprocessing • Alignment & calibration Slide from Gilbert Poulard
Tiers in DC2 More than 23 countries involved
DC2 tools • Installation tools • Atlas software distribution kit • Validation suite • Production system • Atlas production system interfaced to LCG, US-Grid, NorduGrid and legacy systems (batch systems) • Tools • Production management • Data management • Cataloguing • Bookkeping • Job submission • GRID distributed analysis • ARDA domain: test services and implementations
Software installation • Software installation and configuration via PACMAN • Full use of the Atlas Code Management Tool (CMT) • Relocatable, multi-release distribution • No root privileges needed to install • GRID-enabled installation • Grid installation via submission of a job to the destination sites • Software validation tools, integrated with the GRID installation procedure • A site is marked as validated after the installed software is checked with the validation tools • Distribution format • Pacman packages (tarballs) • Kit creation • Building scripts (Deployment package) • Built in about 3 hours, after the release is built • Kit requirements • RedHat 7.3 • >= 512 MB of RAM • Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release • Kit installation • pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease • Documentation (building, installing and using) • http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution
Atlas Production System components • Production database • Oracle based • Hold definition for the job transformations • Hold sensible data on the jobs life cycle • Supervisor (Windmill) • Consumes jobs from the production database • Dispatch the work to the executors • Collect info on the job life-cycle • Interact with the DMS for data registration and movements among the systems • Executor • One for each grid falvour and legacy system • LCG (Lexor) • NorduGrid (Dulcinea) • US Grid (Capone) • LSF • Communicates with the supervisor • Executes the jobs to the specific subsystems • Flavour-neutral job definitions are specialized for the specific needs • Submit to the GRID/legacy system • Provide access to GRID flavour specific tools • Data Management System (Don Quijote) • Global cataloguing system • Allows global data management • Common interface on top of the system-specific facilities
Atlas Production System architecture Task = [job]* Dataset = [partition]* JOB DESCRIPTION Location Hint (Task) Task (Dataset) Task Transf. Definition Data Management System + physics signature Human intervention Job Run Info Location Hint (Job) Job (Partition) Partition Transf. Definition Transformation infos Release version signature Supervisor 1 Supervisor 2 Supervisor 3 Supervisor 4 Jabber Jabber Jabber Jabber US Grid Executer NG Executer LSF Executer LCG Executer Chimera RB RB US Grid LCG NG Local Batch
DC2 status • DC2 first phase started May 3rd • Test the production system • Start the event generation/simulation tests • Full production should start next week • Full use of the 3 GRIDs and legacy systems • DC2 jobs will be monitored via GridICEand an ad-hoc monitoring system, interfaced to the production DB and the production systems
Atlas Computing & INFN (1) • Responsibles & managers • D. Barberis • Genova, inizialmente membro del Computing Steering Group come responsabile del software dell.Inner Detector, ora ATLAS Computing Coordinator • G. Cataldi • Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore • S. Falciano • Roma1, responsabile TDAQ/LVL2 • A. Farilla • Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction Coordinator e coordinatore del software per il Combined Test Beam • L. Luminari • Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia • A. Nisati • Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board • L. Perini • Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE • G. Polesello • Pavia, Atlas Physics Coordinator • A. Rimoldi • Pavia, ATLAS Simulation Coordinator e membro del Software Project Management Board • V. Vercesi • Pavia, PESA Coordinator e membro del Computing Managament Board
Atlas Computing & INFN (2) • Atlas INFN sites LCG compliant for DC2 • Tier-1 • CNAF (G.Negri) • Tier-2 • Frascati (M. Ferrer) • Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa) • Naples (G. Carlino, A. Doria, L. Merola) • Rome1 (A. De Salvo, A. Di Mattia, L. Luminari) • Activities • Development of the LCG interface to the Atlas Production Tool • F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa • Participation to the DC2 using the GRID middleware (May - July 2004) • Local productions with GRID tools • Atlas VO management (A. De Salvo) • Atlas code distribution (A. De Salvo) • Atlas code distribution model (PACMAN based) fully deployed • The current installation system/procedure gives the possibility to have easily the cohexistence of the Atlas software and other experiments’ environment • Atlas distribution kit validation (A. De Salvo) • Transformations for DC2 (A. De Salvo)
Conclusions • First real test of the Atlas computing model is starting • DC2 tests started at the beginning of May • “Real” production starting in June • Will give important informations for the Computing TDR • Very intensive use of the GRIDs • Atlas Production System interfacted to LCG, NG and US Grid (GRID3) • Global data management system • Getting closer to the real experiment computing model