290 likes | 299 Views
This report provides an overview of the computing and storage infrastructure at INFN-CNAF in Bologna, Italy. It includes details on the location, capacity, access, and participation in various projects. The report also discusses the current layout, CPU farms, storage systems, and future plans for upgrades and improvements.
E N D
INFN T1 & T2 report Luca dell’Agnello INFN-CNAF
Introduction • Location: INFN-CNAF, Bologna (Italy) • one of the main nodes of the GARR network • Hall in the basement (-2nd floor): ~ 1000 m2 of total space • Easily accessible with lorries from the road • Not suitable for office use (remote control) • Computing facility for INFN HNEP community • Partecipating to LCG, EGEE, INFNGRID projects • Multi-Experiment TIER1 (LHC experiments, VIRGO, CDF, BABAR, AMS, ARGO, MAGIC, PAMELA,…) • Resources assigned to experiments on a yearly basis
The new Layout: Racks shown in zones 12 e 13 include the total required space Same for the CPU island Isola HD 10 11 ZONA2 ZONA1 ZONA3 MAX. 36 Rack 12 13
Farm • 1 LSF farm with ~ 700 hosts ( ~ 1400 CPUs ~ 2800 CPU slots) • ~1600 KSi2k Total • At least one queue per VO defined on LSF • Scheduling based on fairshare (no dedicated resources) • Cumulative CPU time history (10 days) • LHC VOs share: 15% each • New tender (800 KSI2k) completed • 165 bi-processors WNs (dual core woodcrest) • KSI2k delivered should be larger • Test phase (exp. commissioning February 2007) • Issues and plans • support for lcg-info-dynamic-scheduler (for LSF) • Info for job priorities not (completely?) published • Upgrade to SL(C) 4.4 • 64 bit OS support • Tests with XEN for virtual WNs
KSI2k days ALICE (2006) KSI2k days ATLAS (2006) KSI2k days CMS (2006) Farm usage Declared CPU slots 20/01/07 Failure of monitoring system Available CPU slots Last Sunday last month KSI2k days LHCb (2006)
2x2 Gbps 2 x 2GB Interlink connections A1 A2 B2 RAID5 Storage & Mass storage 192 FC ports • Total disk space at Tier1: ~ 600 TB raw (~ 500 TB net disk space) organized in SAN • Completely allocated • New storage (400 TB raw) delivered – prod. February • Mainly based on SATA-FC technology (for general use) • Good performances and scalability • Raid 5 + hot spare • Possibility to have redundant path to the data • Access via srm, gridftp, rfio, native GPFS, (NFS ) • HMS based on CASTOR (v2) • StorageTek L5500 library (up to 20 drives, 1 PB on line) • 6 LTO2 drives (20-30 MB/s) with 1300 tapes • 7 (+3) 9940B drives (25-30 MB/s) with 1350 (+600) tapes • CASTOR file system hides tape level • Access to HSM • Native access protocol: rfio • srm interface for grid fabric available (rfio/gridftp) B1
Tape library evolution • New library needed to fulfill the LHC (and also non LHC) experiments requests • L5500 library up to only 1 PB on line • Possible solutions: • SUN SL8500 • 10000 slots (currently for 500 GB tapes 1 TB in 18 months) and up to 80 T10000 drives • IBM 3584 • 7000 slots (currently for 500 GB tapes) • Time issue for the tender • Preparing tender specs now, hw commissioned Q4 2007 • Issue of maintenance of present L5500 after 2008 • Move data on 9940b to newest tapes? • Move 9940b drives to new library (only if SL8500) ?
CASTOR • CNAF storage for LCG presently based (mainly) on CASTOR • Migration to CASTOR2 completed last September • Many issues solved • A single stager for all supported VOs • Stager instance installed for testing (e.g. SRM 2.2) and for tape repack operation • ~ 160 TB of net disk space • 25 disk servers (rfio/gridftp) + 12 tape servers • Monitoring/notification: what we have seems not to be enough, testing LEMON • Need to clearly monitor performances • In production next February • Evolution • Increase of number of disk servers (currently 1 each 12 TB) where necessary (hw delivered, to be installed) • Differentiation for disk pools • Disk buffer for data import from CERN • Disk buffer for wn access • SRM 2.2 installation (test-bed ready for installation) • Test disk servers with SL(C)4
GPFS • CASTOR is not the only storage solution at CNAF: also GPFS in use (mainly from non-LHC experiments) • ~ 125 TB of net disk space served • 12 disk servers • Tests: 320 MB/sec effective I/O • High availability on disk servers • Tests for native mode use on going (in collaboration with some experiments) • Disk only solution candidate (D1T0) • srm 2.2 end points for GPFS (StoRM) available for pre-production test (beside the one for SRM wg) • Issue of moving efficiently data back and from CASTOR yet to be addressed • IBM support options under consideration • (I dare to say ) no major problems with GPFS so far
Storage classes implementation (at CNAF) • Disk0Tape1 will be (naturally!) CASTOR • Input buffer for raw data from CERN • Disk1tape1 will be (probably!) CASTOR • Buffer for data transfers to and from T1s and T2s • Disk1tape0: also investigating GPFS/StoRM • Open issue: efficiency of data moving to and from CASTOR (or a different system of backup) • More tests needed (soon!) • Developing detailed plan for this (now)
Db service • Activity within 3D project • One 2-nodes Oracle RAC allocated to LHCb already in production with the LFC read only replica. LHCb condition database replica will be available in few days (on the same cluster) • One 2-nodes Oracle RAC allocated to ATLAS (condition database) • One 3-nodes Oracle RAC already in place, will be allocated to Grid Services (FTS, LFC, VOMS) • One 2-nodes RAC dedicated to Streams throughput tests • Castor2: 3 single instance DBs (nameserver, stager, dlf) plus one single instance for stager tests. • Starting tests for VOMS read-only replica in March. • New Fibre Channel storage is just been delivered, we expect to put it in production in February. Once in production, present clusters will be migrated to the new storage. This will imply a scheduled down of few days. • Squid server installed for CMS
CERN CERN CNAF CNAF LHCb LFC read–only replica • CNAF slave maintained consistent with the CERN master at the database level through the Oracle Streams replication (170 KB/s) misalignment 25000 seconds (7 hours test) • Latency 4-10 seconds • Except for the “red area” • Total: 120 entries/s
GEANT GARR BD CISCO 7600 WAN connectivity FZK 10 Gbps default 10 Gbps 10 Gbps default LHCOPN link Juniper GARR T1 LAN CNAF
FTS at CNAF • 2 FTS server (web service + agents), 1 idle; • Oracle DB • Full standard T1-T1 + T1-T2 + STAR channels • gLite FTS used by CMS within PhEDEx • Implementation and deployment done, now working on • precious insight from Spring/Summer testing + CSA06 + SC4 • Often observed problems related to CASTOR2 • Also non-LHC VOs (ARGO, VIRGO) supported with dedicated channels • FTS production quality (hopefully during February 2007) • 3 dedicated servers in load balancing/fail-over • All agents will be configured on each server (but running distributed on the 3 servers); • Oracle backend will be installed on a Oracle RAC
Inbound transfers in CSA06 ~70 TB imported monthly >1-1.5 TB (up to 6) imported daily Rates as high as >80-100 MB/s per some hours, with peaks at 170 MB/s (hourly averages)
2FTEs + • 4 FTEs dedicated to experiments • (3 more to come) An interface to experiments is being organized Tier1 Unita’ Funz. • 1.5 FTEs • (1 more to come) • 3.5 FTEs • (1 more to come) • 7.5 FTEs • (2 more to come) • 2.5 FTEs • (1 more to come) Farming Storage Network Security Technical services 16 FTEs + 4 FTEs dedicated to experiments. 7(+2) staff Human Resources • Human resources: currently 16 FTEs + “dedicated” personnel from experiments • Most of crucial duties (e.g. CASTOR, ORACLE) assigned to temporary people
Italian Tier2s (1) • INFN-CATANIA – ALICE (also supporting a pletora of VOs) • 4 FTEs • Farm with ~ 200 KSI2k, DPM SE with 20 TB of disk space • INFN-LNL – CMS (also supporting other LHC VOs and CDF) • Farm with ~ 200 KSI2k (1 CE), DPM SE (SRM) with 4 disk servers and 12 TB of disk space • Moving SRM SE to dCache (March 2007) • Also classical SE with 6 TB of disk still in production (to be dismissed) • Additional 20 TB of disk are being installed • CMS specific services (e.g. squid, phedex) available • INFN-Napoli – ATLAS (also supporting other VOs) • 3 FTEs • Farm with ~ 100 KSI2k (1 CE), DPM SE (SRM) with 3 disk servers and 28 TB of disk space
Italian Tier2s (2) • INFN-Roma – CMS • 2 FTEs • Farm with ~ 100 KSI2k (1 CE +1 backup CE), DPM SE with ~ 10 TB of disk space • Moving to dCache • 3 additional NAS systems to be delivered • INFN-ROMA – ATLAS • 2 FTEs • Farm with ~ 80 KSI2k (1 CEs) DPM SE with ~ 8 TB of disk space • INFN-Torino – ALICE (also supporting LHC VOs, BABAR, BIOMED, ZEUS, NA48) • 4 FTEs • Farm with ~ 50 KSI2k (1 CE), classic SE with 2 TB of disk • Additional ~ 80 KSI2k and 50 TB of disk to be delivered (xrootd access)
Other sites • Other sites actively participating to challenges • Milano – ATLAS • Bari - CMS • Pisa – CMS
Services on Italian grid • Grid services managed by ROC-IT…. • WMS, BDII, VOMS, GridICE, HLR (for DGAS) • … and also with the contribution of Tier1 staff • LFC, FTS • ROC-IT also provides monitoring and support to the whole Italian grid
Accounting on the Italian grid • DGAS (Distributed Grid Accounting System) deployed on the whole Italian grid • At Tier1a local tool (Red Eye) also used to feed LSF usage records into HLR • DGAS relies on the “Maarten Litmaath’s patch” for LCG CEs (enabling the common job record logfile) • All records registered to HLR where ??? • Currently checking the installation and configuration on all sites • verifying that the expected information is in the unified log • cross-checking the correctness of the accounting by comparing the HLRs data with the LRMS raw log information • Testing also on dgas2Apel procedure to feed DGAS data into GOC APEL db • Present version does not send DN with the records
DGAS concentrator Torino HLR Torino HLR Roma 1 HLR Milano HLR Padova CE Torino CE Roma 1 CE Milano CE Padova CE Trieste CE Ferrara CE Bologna CE Perugia CE Firenze CE Genova DGAS Deployment Italian Grid • DGAS deployed in 43 sites (RPM+YAIM) • L1 HLR at INFN-T1 and in 10 T2 Sites • 2 of them registering data for small T3 sites • L2 HLR in 1 Site (Torino) collecting data for 4 sites (Torino, Padova, Roma1, Milano) . INFN-ROMA 1-2-3
StoRM • StoRM is a storage resource manager for disk based storage systems. • It implements the SRM interface version 2.x. • StoRM is designed to support guaranteed space reservation and direct access (native POSIX I/O call), as well as other standard libraries (like RFIO). • StoRM take advantage from high performance parallel file systems. Also standard POSIX file systems are supported. • A modular architecture decouples StoRM logic from the supported file system. • Strong security framework with VOMS support. by L. Magnoni
StoRM General Considerations 1/2 • File system currently supported by StoRM • GPFS from IBM. • XFS from SGI. • Any other File System with POSIX interface and ACLs support. • Light and flexible namespace structure • The namespace of the files managed by StoRM relies upon the underlying file systems. • StoRM does not need to query any DB to know the physical location of a requested SURL. by L. Magnoni
StoRM General Considerations 2/2 • ACLs Usage • StoRM enforce ACL entries on physical files for the local user corresponding to the grid-credential. • Standard grid applications (such as GridFTP, RFIO, etc.) can access the storage on behalf of the user. • Scalability and high availability. • FE, DB, and BE can be deployed in 3 different machines. • StoRM is designed to be configured with n FE and m BE, with a common DB. But more tests are needed to validate this scenario. by L. Magnoni
StoRM Grid usage scenario • StoRM dynamically manages files and space in the storage system. Applications can directly access the Storage Element (SE) during the computational process. by L. Magnoni File metadata are managed (and stored) by underlying file system. No replica of metadata at application level.. That is a file system job! In this way StoRM gain in performance. Data access is performed without interacting with an external service, with great performance improvement (POSIX calls). Otherwise, standard data access using I/O Server (such as RFIO) is also fully supported.
StoRM status and SRM issues • Status • Migration to SRM v2.2 completed. • All functions requested by the SRM WLCG usage agreement are implemented. • New version of StoRM available. • StoRM SRM tests • StoRM is involved in interoperability tests made by SRM-WG, the results are available here: http://sdm.lbl.gov/srm-tester/v22-progress.html • StoRM is involved also in other SRM tests made with S2 test suite: http://gdrb02.cern.ch:25000/srms2test/scripts/protos/srm/2.2/basic/s2_logs/ by L. Magnoni
Target Power, Refrigeration and Racks for 2010 • CPU + Servers ->2000 BOX a 500Watt(Max)/Box -> 1 MWatt • CPU + Servers -> 30 Box/Rack -> 70 Racks ;up to 20 KWatt/Rack • Disk : 6300 Boxes at 30Watt/Disco ->200 KWatt • Disco-> 150 dischi/Rack -> 40 Racks • Tapes: 10 PB requested • Now 0.5 TB/tape . Assume for 2010-> 1 TB tape • Current Library: 5000 Cassette( 1PB) -> New Library(10PB)-> 30KWatt • Total power for computing resources in the room: 1.230 MWatt • Target parametrs for the Tier1 2010 upgrade plan: • -> 1.5 MWatt of refrigeration power (20% for local chillers) • -> 3.0 MWatt total for chillers and services(Now 1.2 MWatt) • -> 132 Raks (70 CPU+ 42 Disk + 20% contingency) OK • Spaces required available • 1x UPS 10m2, 5500Kg; • 1x cooling system 16m2, 6600Kg; • 1x Electric Generator 11m2, 11000Kg) by M. Mazzucato