220 likes | 236 Views
This report provides an overview of the current status of INFN-T1's infrastructure, network, data management and storage, farming, and projects and activities.
E N D
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff
Outline • Infrastructure • Network • Data management & Storage • Farming • Projects and activities Andrea Chierici
Current status • Working with 2 power lines but only still only 1 under UPS • Fix of second line delayed • Tender open for storage tank for rainwater • Rerouted water pipes that run below Tier1 floor • Installed surveillance system with 12 cameras and environment monitoring • Setup access control to data center • No turnstile Andrea Chierici
Current Status • LHCOPN+LHCONE shared physical link: now upgraded to 2x 100Gb/s • Upgraded LHCOPN: dedicated link to CERN is now 2x 100Gb/s • A Next-Generation Firewall has been installed on General IP link • Palo Alto PA-5250 Andrea Chierici
Network diagram General IP Next Generation Firewall LHC OPN/ONE 2x10Gb/s PA 5250 Firewall Desk Resources 100Gb/s 100Gb/s Cisco7600 VPC Link Nexus 9516 Nexus 9516 Storage disk servers are connected at 2x 100Gb/s. MostRecent Computing resources 4x40Gb/s Every disk server or farming switch connected to both core switches Nexus 7018 3x40Gb/s (6x40Gb/s in total) Old “single homed” resources Andrea Chierici
Storage resources • Disks: • 2019 pledge: 39 PB • Currently used: 30 PB • Tapes: • 2019 pledge: 89PB • Currently used 60.6PB • Currently installed 69PB • Extendible up to 84PB • Tender for the new library has been published • The library is foreseen to be up and running by Fall 2019 Andrea Chierici
2018 Storage installation • Installation of the second part of 2017-2018 tender completed in Feb. 2019 • 3 OceanStor 18000v5 systems, ~11.52PB of usable space • 804x 6TB NL-SAS disks each system • 12x 900GB SSD each system • 4 controllers for each system • 4x FDR IB (2x 56Gbps) each contr. • 12 servers (2x FDR IB, 2x 100GbE) Andrea Chierici
Storage in Production • 2 DDN SFA 12K (FDR IB) - 10240 TB • 4 DELL MD3860f (FC16) - 2304 TB • 2 DELL MD3820f (FC16) – SSD for metadata • Huawei OS6800v5 (FC16) - 5521 TB • Huawei OS18000v5 (FDR IB), - 19320 TB Total (On-line): 37385 TB Pledge 2019: 38721 TB Delta: -1336 TB Kind of «Thinprovisioning» via quotas for ~40 collaborations living on shared FS Andrea Chierici
Running job trend Andrea Chierici
Computing resources • Farm power: approx. 410 KHS06 • 2017 tender power consumption 22 KW vs an average of 13 • CINECA partition has too many cores, hard to use 100%. Reduced job slots to 62 • 2019 tender not out yet: 30 KHS06 • Migration to centos7 complete • If VO does not support centos7, singularity can be used to run on sl6 Andrea Chierici
Deployment of condor pilot • Small test instance running • used to preparepuppetconfiguration classes • New hardware purchasedat end of 2018 will be used to install production CE and Condor-manager • Final solution will be a mix of real and virtual machines • Activity is a priority of farming group • LSF licenses expired 31 dec 2018 • Still usable, but no updates or patches can be applied after that date Andrea Chierici
Puppet status • INFN-T1 running with puppet v5 • All local modules are compatible • Next step is to update common modules (coming from PuppetLabs) • Once all modules are updated, foreman will be updated to latest • Tests to migrate to puppet v6 will begin immediately after Andrea Chierici
Cloud@CNAF • Extending existing cloud instance (SDDS) to T1 resources • Since the SDDS area is separated, most of the services are duplicated: 2 logical regions • Testbed is working with basic services • All the configurations are tested and automated through puppet • Now moving to production with a small fraction of resources using basic services Andrea Chierici
HPC Farm • 2 clusters • Older: 25 nodes with dual Intel v3 CPUs, InfiniBand interconnection, GPFS shared FS, 4 Nvidia K40 and 4 Nvidia V100 • New: 20 nodes with dual intel v4 CPUs, Omnipath interconnection, GPFS shared FS, 4 Nvidia V100 • Used mainly by LHC VOs, CERN accelerator physics group and local INFN users Andrea Chierici
CDF LTDP • CNAF provides the maintenance of the CDF RUN-2 dataset (4 PB) collected during 2001-2011 (stored on tapes). • 140 TB of CDF data lost during 2017 flood have been successfully re-transferred from FNAL to CNAF via GridFTP protocol. • The «Sequential Access via Metadata» (SAM) data handling tool (developed at FNAL) has been installed on a dedicated SL6 server for CDF data management. • The SAM station performs a real time validation of the checksum stored in an Oracle database • The Oracle CDF database also stores information about specific dataset locations and metadata. • Recent tests showed that analysis jobs using software installed on CVMFS and requesting delivery of files stored on CNAF tapes, work properly. Andrea Chierici
ISMS Area • CNAF got ISO27001 certification in 2017 • Systematic approach to manage sensitive information so that it remains secure (in the sense of confidentiality, integrity and availability). It includes people, processes and IT systems by applying a risk management process • 2 racks right now implement an ISMS • Several scientific collaborations interested • Harmony (Big data in hematology) • AAC (the largest Italian organization for cancer research) • IRST Meldola (Research on cancer) Andrea Chierici