200 likes | 317 Views
INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team CERN, Nov 15 2005. The INFN Grid strategy for SC. Grid middleware services were conceived since the beginning as being of general use and having to satisfy the requirements of other sciences
E N D
INFN: SC3 activities and SC4 planning Mirco Mazzucato On behalf of Tier1 and INFN SC team CERN, Nov 15 2005 INFN: SC3 activities and SC4 planning
The INFN Grid strategy for SC • Grid middleware services were conceived since the beginning as being of general use and having to satisfy the requirements of other sciences • Biology, Astrophyisics, Earth Observation…… • Tight collaboration with CERN, as natural coordinator of European Projects for M/W and EU e-Infrastructure developments and other National Projects (UK e-Science…) leveraging from • CERN Tier0 role and natural LCG coordinator • CERN as an EU excellence lab. in S/W production (WEB…) • Strong integration between the National and EU e-Infrastructure (EDG, EGEE/LCG, EGEE-II) • “Same” M/W, same services, integrated operation and management of common services provided by EU funded projects manpower • National development of complementary or missing services but well integrated in EGEE/LCG M/W Service Oriented Architecture • Strong preference for common services compared to specific solution considerd as temporary when duplicating functionalities of general services INFN: SC3 activities and SC4 planning
SC3 Throughput PhaseResults INFN: SC3 activities and SC4 planning
SC3 configuration and results: CNAF (1/4) • Storage • Oct 2005: Data disk • 50 TB (Castor front-end) • WANDisk performance: 125 MB/s (demonstrated, SC3) • SE Castor: Quattor • Oct 2005: Tape • 200 TB (4 9940B + 6 LTO2 drives) • Drives shared with production • WANTape performance: mean sustained ~ 50 MB/s (SC3, throughput phase, July 2005) • Computing • Oct 2005: min 1200 kSI2K, max1550 kSI2K (as the farm is shared) INFN: SC3 activities and SC4 planning
2x 1Gbps links aggregation SC3 configuration and results: CNAF (2/4) LAN and WAN connectivity GARR 10 Gbps link (November) CNAF Production Network L0 General Internet L2 L1 backdoor (CNAF) 192.135.23.254 1Gbps link 192.135.23/24 n x 1Gbps link T1 SC layout INFN: SC3 activities and SC4 planning
SC3 configuration and results: CNAF (2/3) Network • Oct 2005: • 2 x 1 GEthernet links CNAF GARR, dedicated to SC traffic to/from CERN • Full capacity saturation in both directions demonstrated in July 2005 with two single memory-to-memory TCP sessions • CERN – CNAF Connection affected by sporadic loss, apparently related to concurrence with production traffic (lossless connectivity measured during August) • Tuning of Tier-1 – Tier-2 connectivity for all the Tier-2 sites (1 GigaEthernet uplink connections) • PROBLEMS and SOLUTIONS: • Catania and Bari: throughput penalties caused by border routers requiring hw upgrades • Pisa: MPLS mis-configuration causing asymmetric throughput configuration fixed • Torino: buggy Cisco IOS version causing high CPU utilization and packet loss IOS Upgrade • Nov 2005: • ongoing upgrade to 10 GEthernet, CNAF GARR, dedicated to SC • policy routing at the GARR access points to grant exclusive access to the CERN – CNAF connection by LHC traffic • Type of connectivity to INFN Tier-2 sites under discussion • Realization of a backup connection Tier-1 Tier-1 (Karlsruhe) – ongoing discussion with GARR and DFN • Ongoing 10 GigaEthernet LAN tests (Intel Pro/10 Gb): • 6.2 Gb/s UDP, 6.0 Gb/s TCP, 1 stream, memory-to-memory (with proper PCI and TCP stack configuration). INFN: SC3 activities and SC4 planning
SC3 configuration and results: CNAF (3/3) • Software • Oct 2005: • SRM/Castor • FTS: ongoing installation of a second server secondo server (installation and configuration with QUATTOR), backend: DB oracle on a different host • LFC: installation on a new host with more reliable configuration (raid5, redundant power supply) • farm middleware: LCG 2.6 • Software in evolution. Good response of developers. Bug fixing on going, need more effort. • Debug of FTS and Phedex(CMS) has required duplication of efforts • Nov 2005, File Transfer Service channels configured: • CNAF Bari • CNAF Catania • CNAF Legnaro • CNAF Milano • CNAF Pisa • CNAF Torino • FZK CNAF(full testing to be done) 2nd half of July: FTS channel tuning T1-T2s (number of concurrent gridftp sessions, number of parallel streams) • Starting from Dec 2005: • Evaluation of dCache and StoRM (for disk-only SRMs) • Possible upgrade to CASTOR v2 INFN: SC3 activities and SC4 planning
Tier-2 sites in SC3 (Nov 2005) • Torino (ALICE): • FTS, LFC, dCache (LCG 2.6.0) • Storage Space: 2 TBy • Milano (ATLAS): • FTS, LFC, DPM 1.3.7 • Storage space: 5.29 TBy • Pisa (ATLAS/CMS): • FTS, PhEDEx, POOL file cat, PubDB, LFC, DPM 1.3.5 • Storage space: 5 TBy available, 5 TBy expected • Legnaro (CMS): • FTS, PhEDEx, Pool file cat., PubDB, DPM 1.3.7 (1 pool, 80 Gby) • Storage space: 4 TBy • Bari (ATLAS/CMS)i: • FTS, PhEDEx, POOL file cat., PubDB, LFC, dCache, DPM • Storage space: 1.4 TBy available, 4 TBy expected • Catania (ALICE): • DPM and Classic SE (Storage space: 1.8 Tby) • LHCb • CNAF INFN: SC3 activities and SC4 planning
SC3 Service PhaseReport INFN: SC3 activities and SC4 planning
CMS, SC3 Phase 1 report (1/3) • Objective: 10 or 50 TB per Tier-1 and ~5 TB per Tier-2 (source: L.Tuura) INFN: SC3 activities and SC4 planning
CMS, SC3 Phase 1 report (2/3) Source: L.Tuura INFN: SC3 activities and SC4 planning
CMS, SC3 Phase 1 report (3/3) Source: L.Tuura INFN: SC3 activities and SC4 planning
LHCb, report Phase 1 (Data moving) • Less than 1TB stripped DSTs replicated. At INFN most of this data already existed with only a few files missing from the dataset therefore it was only necessary to replicate a small fraction of the files from CERN (source: A. C. Smith) • Tier1-to-Tier1 FTS channel configured (FZK CNAF) • Configuration of an entire Tier1-to-Tier1 channel matrix under evaluation (for replication of stripped data) INFN: SC3 activities and SC4 planning
ATLAS • Production phase started on Nov 2 • CNAF: some problems experienced in August (power cut, networking and LFC client upgrade) • 5932 files copied and registered at CNAF: • 89 “Failed replication” events • 14 “No replicas found” INFN: SC3 activities and SC4 planning
ALICE • CNAF, and Torino in production • Small failure rates, after the initial debugging phase • Overall job output produced: 5 TBy • Other sites are still being debugged: • CNAF: proxy renewal service did not exist - being debugged, ALICE part deployed and configured • Catania: LCG configuration problem being addressed, ALICE part deployed and configured • Bari: LCG & ALICE part deployed and configured, being tested with a single job, to be opened in production • CNAF and Catania to be opened to production • (Information source: P.Cerello, Oct 26 2005) CNAF TORINO INFN: SC3 activities and SC4 planning
Tape pool utilization at CNAF (1/2) ALICE ATLAS INFN: SC3 activities and SC4 planning
Tape pool utilization at CNAF (2/2) CMS LHCb INFN: SC3 activities and SC4 planning
SC4 Planning INFN: SC3 activities and SC4 planning
INFN Tier-1: long-term plans (Oct 2006) • SC4: storage and computing resources will be shared with production • Storage • data disk: • additional 400 TB (approx 300 TB for LHC) • TOTAL: approx 350 TB • Tape: up to 450 TB • Computing • Additional 800 kSI2K • TOTAL: min 2000 KSI2k, max 2300 KSI2k • Network • 10 GEthernet CNAF CERN • 10 GEthernet CNAF INFN Tier-2 and backup connection to Karlsruhe (?) INFN: SC3 activities and SC4 planning
Tier-2 sites at INFN: SC4 • Currently 9 candidate Tier-2 sites: • in some cases one Tier-2 hosting two experiments • Total: 12 Tier2 (4 sites for every experiment – ATLAS, ALICE, CMS) • LHCb Tier-2: CNAF • Ongoing work to understand: • Number of Tier-2 sites actually needed • Availability of local manpower and adequacy of local infrastructures • Capital expanditure: • Tier-2 overall infrastructure • Computing power and storage • Network connectivity: 1 GEthernet for every Tier-2, Avg Guaranteed bandwidth: 80% of link capacity • Only after this preliminary analysis INFN will be ready for the MoU currently under definition at CERN INFN: SC3 activities and SC4 planning