INFN-T1 site report

INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013

Outline • Network • Farming • Storage • Common services Andrea Chierici

Network

Cisco7600 NEXUS WAN Connectivity • RAL • PIC • TRIUMPH • BNL • FNAL • TW-ASGC • NDFGF LHC OPN IN2P3 SARA LHC ONE GARR Bo1 General IP 10 Gb/s CNAF-FNAL CDF (Data Preservation) 10 Gb/s For General IP Connectivity 10Gb/s 20Gb/s 10Gb/s General IP 20 Gb/s (Q3-Q4 2013) 20 Gb Physical Link (2x10Gb) shared for LHCOPN and LHCONE. LHCOPN/ONE 40 Gb/s (Q3-Q4 2013) T1 resources

Farming and Storage current connection model LHCOPN INTERNET cisco 7600 10Gb/s bd8810 nexus 7018 10Gb/s Disk Servers 2x10Gb/s Up to 4x10Gb/s Oldresources 2009-2010 4X1Gb/s Farming Switch Farming Switch 20 Worker Nodes per switch WorkerNodes • Core switches and routers are fully redundant (power, CPU, fabrics) • Every Switch is connected with load sharing on different port modules • Core switches and routers have a strict SLA (next solar day) for maintenance Andrea Chierici

Farming

Computing resources • 195K HS-06 • 17K job slots • 2013 tender installed in summer • AMD CPUs, 16 job slots • Upgraded whole farm to SL6 • Per-VO and per-Node approach • Some CEs upgraded and serving only some VOs • Older nehalem nodes got a significant boost switching to SL6 (and activating hyperthreading too…) Andrea Chierici

New CPU tender • 2014 tender delayed until beginning of 2014 • Will probably cover also 2015 needs • Taking into account TCO (energy consumption) not only sales price • 10 Gbit WN connectivity • 5 MB/sper job (minimum) required • 1 gbit link is not enough to face the traffic generated by modern multi core CPUs • Network bonding is hard to configure • Blade servers areattractive • Cheaper 10 gbit network infrastructure • Cooling optimization • OPEX reduction • BUT: higher street price Andrea Chierici

Monitoring & Accounting (1) • Rewritten our local resource accounting and monitoring portal • Old system was completely home-made • Monitoring and accounting were separate things • Adding/removing queues on LSF meant editing lines in monitoring system code • Hard to maintain: >4000 lines of Perl code Andrea Chierici

Monitoring & Accounting(2) • New system: monitoring and accounting share same data base • Scalable and based on open source software (+ few python lines) • Graphite (http://graphite.readthedocs.org) • Time series oriented data base • DjangoWebapp to plot on-demandgraphs • lsfmonacct module released on github • Automatic queue management Andrea Chierici

Monitoring & Accounting (3) Andrea Chierici

Monitoring & Accounting (4) Andrea Chierici

Issues • Grid accounting problems starting from April 2013 • Subtle bugs affecting the log parsing stage on the CEs (DGAS urcollector) and causing it to skip data • WNODeS issue upgrading to SL6 • Code maturity problems: addressed quickly • Now ready for production • Babar and CDF will be using it rather soon • Potentially the whole farm can be used with WNODeS Andrea Chierici

New activities • Investigation on Grid Engine as an alternative batch system ongoing • Testing zabbix as a platform for monitoring computing resources • Possible alternative to nagios + lemon • WNs dynamic update to deal mainly with kernel/cvmfs/gpfsupgrades • Evaluating APEL as an alternative to DGAS for grid accounting system Andrea Chierici

Storage

Storage Resources • Disk Space: 15.3 PB-N (net) on-line • 7 EMC2 CX3-80 + 1 EMC2 CX4-960 (~2 PB) + 100 servers (2x1 gbps connections) • 7 DDN S2A 9950 + 1 DDN SFA 10K + 1 DDN SFA 12K(~11.3PB) + ~80 servers (10 gbps) • Installation of the latest system (DDN SFA 12K 1.9 PB-N) was completed this summer • ~1.8 PB-N expansion foreseen before Christmas break • Aggregate bandwidth: 70 GB/s • Tape library SL8500 ~16 PB on line with 20 T10KB drives and 13 T10KC drives (3 additional drives were added during summer 2013) • 8800 x 1 TB tape capacity, ~ 100MB/s of bandwidth for each drive • 1200 x 5 TB tape capacity, ~ 200MB/s of bandwidth for each drive • Drives interconnected to library and servers via dedicated SAN (TAN). 13 Tivoli Storage manager HSM nodes access the shared drives • 1 Tivoli Storage Manager (TSM) server common to all GEMSS instances • A tender for additional 470 x 5TB tape capacity is under way • All storage systems and disk-servers on SAN (4Gb/s or 8Gb/s) Andrea Chierici

Storage Configuration • All disk space is partitioned in ~10 GPFS clusters served by ~170 servers • One cluster per each main experiment (LHC) • GPFS deployed on the SAN implements a full HA system • System scalable to tens of PBs and able to serve thousands of concurrent processes with an aggregate bandwidth of tens of GB/s • GPFS coupled with TSM offers a complete HSM solution: GEMSS • Access to storage granted through standard interfaces (posix, srm, xrootd and soon webdav) • FS directly mounted on WNs Andrea Chierici

Storage research activities • Studies on more flexible and user-friendly methods for accessing storage over WAN • Storage federation implementation • cloud-like approach • We developed an integration between GEMSS Storage System and Xrootd in order to match the requirements of CMS and ALICE, using ad-hoc Xrootd modifications • CMS modification was validated by the official Xrootd integration build • This integration is currently in production • Another alternative approach for storage federations, based on http/webdav (Atlas use-case), is under investigation Andrea Chierici

LTDP • Long Term Data preservation (LTDP) for CDF experiment • FNAL-CNAF Data Copy Mechanism is completed • Copy of the data will follow this timetable: • end 2013 - early 2014 → All data and MC user level n-tuples (2.1 PB) • mid 2014 → All raw data (1.9 PB) + Databases • Bandwidth of 10 Gb/s reserved on transatlantic Link CNAF ↔ FNAL • “code preservation” issue to be addressed Andrea Chierici

Common services

Installation and configuration tools • Currently Quattor is the tool used at INFN-T1 • Investigation done on an alternative installation and management tool (study carried on by storage group) • Integration between two tools: • Cobbler, for installation phase • Puppet, for server provisioning and management operations • Results of investigation demonstrate Cobbler + Puppet as a viable and valid alternative • currently used within CNAF OpenLAB Andrea Chierici

Grid Middleware status • EMI-3 update status • Argus, BDII, Cream CE, UI, WN, Storm • Some UIs still at SL5 (will be upgraded soon) • EMI-1 phasing-out (only FTS remains) • VOBOX updated to WLCG release Andrea Chierici

INFN-T1 site report