350 likes | 457 Views
The ALICE Tier-2’s in Italy. Roberto Barbera (*) Univ. of Catania and INFN Workshop CCR INFN 2006 Otranto, 08.06.2006. (*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation. Outline.
E N D
The ALICE Tier-2’s in Italy Roberto Barbera(*) Univ. of Catania and INFN Workshop CCR INFN 2006 Otranto, 08.06.2006 (*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation.
Outline • The ALICE computing model and its parameters • ALICE and the Grid(s) • Layout • Implementation • Recent results • ALICE Tier-2’s in Italy • Catania • Torino • Bari • LNL-PD • Summary and conclusions Workshop CCR INFN 2006, Otranto, 08.06.2006
The ALICE computing model (1/2) • pp • Quasi-online data distribution and first reconstruction at T0 • Further reconstructions at T1’s • AA • Calibration, alignment and pilot reconstructions during data taking • Data distribution and first reconstruction at T0 during four months after AA • Further reconstructions at T1’s • One copy of RAW at T0 and one distributed at T1’s Workshop CCR INFN 2006, Otranto, 08.06.2006
The ALICE computing model (2/2) • T0 • First pass reconstruction, storage of one copy of RAW, calibration data and first-pass ESD’s • T1 • Reconstructions and scheduled analysis, storage of the second collective copy of RAW and one copy of all data to be kept, disk replicas of ESD’s and AOD’s • T2 • Simulation and end-user analysis, disk replicas of ESD’s and AOD’s Workshop CCR INFN 2006, Otranto, 08.06.2006
Parameters of the ALICE computing model Workshop CCR INFN 2006, Otranto, 08.06.2006
Resources ALICE Agents & Daemons Resources Resources Resources NU Grid OSG ALIROOT ALICE user Computing framework ROOT ALICE & the Grid(s) ALICE Agents & Daemons ALICE CAT ALICE TQ Legenda: TQ= Task Queue Central job DB CAT= Central Catalogue Workshop CCR INFN 2006, Otranto, 08.06.2006
LCG RB VO-Box TQ LCG Site SCA LCG CE LCG SE PackMan SA WN JobAgent Implementation: the “VO-Box” Job request File Catalogue LFN Registration Request configuration SURL Registration LFC Workshop CCR INFN 2006, Otranto, 08.06.2006
Configure, submit and track jobs User interface with massive production support Job DB (Production and user) User and role management Install software on sites Package Managers Distribute and execute jobs Workload Management System (Broker, L&B,…) Computing Element software Information Services Interactive analysis jobs Store and catalogue data Data catalogues (file, replica, metadata, local,…) Storage Element software Move data around File Trasfer services and schedulers Access data files I/O services File management (SRM) Monitor all that stuff Transport infrastructure Sensors Web presentation ..on top of that: Enforce security! Who does what ? Xrootd MIXED MIXED MonALISA PROOF Workshop CCR INFN 2006, Otranto, 08.06.2006
Some statistics and results for SC3/PDC05 • In the last two months of 2005: • 22,500 jobs (Pb+Pb and p+p) • Average CPU time: 8 hours • Data volume produced: 20 TB (90% CASTOR2 at CERN, 10% remote sites) • Resource Centres participating (22 in total) • 4 T1: CERN, CNAF, GridKa, CCIN2P3 • 18 T2: Bari, Clermont (FR) , GSI (D), Houston (USA) , ITEP (RUS), JINR (RUS) , KNU (UKR), Muenster (D), NIHAM (RO), OSC (USA), PNPI (RUS), SPbSU (RUS), Prague (CZ), RMKI (HU), SARA (NL), Sejong (SK), Torino, UiB (NO) • Job share per site: • T1: CERN 19%, CNAF 17% (CPU 20%), GridKa 31%, CCIN2P3 22% • T2: total of 11% • Failure rate di AliRoot: 2.5% Workshop CCR INFN 2006, Otranto, 08.06.2006
Job execution profile during SC3 2450 jobs (25% more than entire lxbatch capacity at Cern) Negative slope: AliEn problem during output retrieval. Fixed in the further release! Workshop CCR INFN 2006, Otranto, 08.06.2006
Use of INFN Grid by LHC Exps.: JOB/VO (Sep 2005 - Dec 2005) Memento: VO= Virtual Organization (esperimento) ALICE: 8% of the total number of jobs on the national grid ~811000 job Without INFN-T1 ~388000 job Workshop CCR INFN 2006, Otranto, 08.06.2006
Use of INFN Grid by LHC Exps.: CPU/VO (Sep 2005 - Dec 2005) Without INFN-T1 ~ 98 years, 2 month, 18 days ~ 358 years, 7 months, 11 days ALICE: 14% of CPU time outside T1 Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE JOBS PER SITE. Warning: Job agents and real jobs are accounted in the same way Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE Tier-2’s in Italy • Four candidates: Bari, Catania, LNL-PD, and Torino (T2 projects available at the URL: http://www.to.infn.it/~masera/TIER2/). • The team of ALICE referees with representatives of the INFN Management Board visited all Tier-2 candidates between 10/2005 and 02/2006. • Referees’ decision communicated at a meeting in Rome on 10/03/2006: • Catania and Torino approved; • Bari and LNL-PD “incubated” (kept in “life support” until real ALICE needs are proved by real test of the computing model in production mode). Workshop CCR INFN 2006, Otranto, 08.06.2006
Network connectivity of T2-s ALICE Tier-2’s Workshop CCR INFN 2006, Otranto, 08.06.2006
Present installation Future expansion Catania (1/5) – Comp. room Workshop CCR INFN 2006, Otranto, 08.06.2006 Space available for installations: ~160 m2
Catania (2/5) - Infrastructure High Density System Traditional System Workshop CCR INFN 2006, Otranto, 08.06.2006
Catania (3/5) - CPU • 150 kSI2k • SuperMicro dual AMD dual-core 275 with 4 GB RAM in 1U configuration • IBM LS20 “blades” with dual AMD dual-core 280 with 4 GB RAM (within june) • LSF 6.1 as LRMS Workshop CCR INFN 2006, Otranto, 08.06.2006
21+ TB with GPFS Catania (4/5) - Storage • FC-2-SATA systems plus more • traditional DAS with EIDE-2-SCSI • controllers • Filesystem: GPFS Workshop CCR INFN 2006, Otranto, 08.06.2006
Catania (5/5) - Statistics Last month activity Workshop CCR INFN 2006, Otranto, 08.06.2006
Torino (1/5) – Computing Room Workshop CCR INFN 2006, Otranto, 08.06.2006
Torino (2/5) - Present installation • Present solutions: blade servers (IBM) and 1U biprocessors • Guidelines for the future: • Minimize space • Minimize power consumption Workshop CCR INFN 2006, Otranto, 08.06.2006
Torino (3/5) - Resources • CPU • 38 Intel(R) Xeon(TM) CPU 2.40GHz; • 12 Intel(R) Xeon(TM) CPU 3.06GHz. • 45 Intel Biprocessors (<=4 years – 14 Blades) • DISK • ~6TB dedicated to ALICE • 2TB shared among various VO’s (Classic-SE); • 1 dCache SE with an internal disk of ~80GB for tests; • ~15TB of disk space for ALICE is going to be commissioned soon. It is a FLX210 with 3 FLC200 expansions from di StorageTek • Filesystem • Ext3 for the ClassicSE; not yet defined for the new storage system; • Tests with xrootd for local and remote access (through proxy) are scheduled. • LRMS • Torque-Maui; the default one coming with the INFN Grid release Open to all VO’s Dedicated to ALICE (at the moment) Workshop CCR INFN 2006, Otranto, 08.06.2006
Torino (4/5) - Resources • Future evolution • Many nodes (~20 – the most recent) are being migrated from the ALICE farm to the LCG farm exploiting the forthcoming upgrade to gLite 3.0; • New WN’s (80 cores – 130 KSI2K), recently bought, will be installed and configured very soon. • Networking: • All WN’s are in a hidden LAN (only outbound connectivity is allowed) and the NATting is done by an Extreme Networks switch. Almost all connection are Gigabit Ethernet. • Monitoring: • MRTG and NAGIOS for the local control of the farm. Workshop CCR INFN 2006, Otranto, 08.06.2006
Torino (5/5) - Usage Monitoring centrale ALICE LCG. Numero di Job Scheduler locale. # di job Workshop CCR INFN 2006, Otranto, 08.06.2006
Bari (1/2) • Bari is a Tier-2 candidate both for ALICE and CMS. • Bari supports also other VO’s. • Priorities are given to the various VO’s proportionally to the different budgets for acquiring resources. • In the last two years Bari has provided resources for ALICE both for PDC04 and SC3 and will provide for SC4. Workshop CCR INFN 2006, Otranto, 08.06.2006
Bari (2/2) • One 2 cpu 700 MHz PIII aligrid1.ba.infn.it - HD 40 GB • One 2 cpu 1 GHz PIII alicegrid2.ba.infn.it - HD 160 GB • Three 2 cpu Intel Xeon 1.8 GHz alicegrid4 - alicegrid6 (VOBOX) - 3 HD da 80GB • One 2 cpu Intel Xeon 1.8 GHz alicegrid3.ba.infn.it - (SE for PDC04) with 0.7 TB of data • One 2 cpu Intel Xeon 2.4 GHz alicegrid5.ba.infn.it - (SE for Finuda) with 1.5 TB disk space • Three 2 cpu Intel Xeon 2.4 GHz - HD 80 GB • One 2 cpu Intel Xeon 2.4 GHz alicegrid7.ba.infn.it - HD 80 GB - software repository + Quattor installation server • One Opteron 2 dual core 275 - HD 120 GB • Three 2 cpu Intel Xeon 2.8 GHz - HD 80 GB • One 2 cpu Intel Xeon 3.0 GHz EM64T - HD 2 array x 2.5 TB (TOT 5 TB) (to be configured with xrootd for SC4) Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE jobs at Bari (monitored by MonaLisa) Workshop CCR INFN 2006, Otranto, 08.06.2006
LNL-PD • Background: • LNL-PD is an approved Tier-2 for CMS; • Many-years experience in running a T2 prototype for CMS. • Size of the existing Tier-2 for CMS: • CPU: ~200 KSI2K (almost all “blades” dual core) • Storage: EIDE-2-SCSI DAS with 3Ware + Storage Area Network • LRMS: LSF • Monitoring: Ganglia (local) + GridIce Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE at LNL-PD • ALICE activities already done: • ALICE VO-box installed in 02/2006 • Site testing with small productions OK • Big ALICE production in April-May via LCG • Future activities foreseen for the rest of 2006: • Participation to PDC06 (~10 kSI2k dedicated resources + the possibility to use CMS resources, if/when available) • Installation of an ALICE storage system with xrootd (~1 TB at the beginning) Workshop CCR INFN 2006, Otranto, 08.06.2006
ALICE jobs at LNL-PD(monitored by GridIce) 15 April 2006 – 15 May 2006 ALICE Workshop CCR INFN 2006, Otranto, 08.06.2006
Common issues • Need for a common solution for the infrastructure (to improve the economy of scale). • Need for an affordable, reliable, and scalable solution for the storage. • Need for a better organization of distributed support for Tier-2’s. • Although new technologies (“blades” with low-power CPU’s) help a bit, power consumption at Tier-2 sites is becoming increasingly important from an economic point of view. Strict guidelines and a dedicated budget should be centrally created by INFN Management. Workshop CCR INFN 2006, Otranto, 08.06.2006
The future: PDC06 (June 2006) • Check of the distributed computing model: • From raw-data to ESD • Data tranfers among sites • Calibration and alignment • Analysis • SC3 experience has helped a lot to improve AliEn (current version 2.10) • Intense development of AliRoot to include calibration and alignment code for all sub-detectors and reduce the percentage of run time failures. Huge effort of the Italian groups in many sites. Workshop CCR INFN 2006, Otranto, 08.06.2006
Resources ramp up at INFN Tier-2’s Workshop CCR INFN 2006, Otranto, 08.06.2006
Summary and conclusions • The ALICE computing model has been finalized and now it is ready to face the forthcoming data from LHC. • INFN has identified the first official Tier-2’s for ALICE. • Both for the design and the day-by-day operation of a LHC Tier-2 a strong collaboration between the Experiments, the INFN Grid Project, the INFN CCR, and the Computing&Network Services at the various INFN Departments is of vital importance. Workshop CCR INFN 2006, Otranto, 08.06.2006