Tier1 Status Report

Tier1 Status Report Martin Bly RAL 27,28 April 2005

Topics • Hardware • Atlas DataStore • Networking • Batch services • Storage • Service Challenges • Security Tier1 Status Report - HEPSysMan, RAL

Hardware • Approximately 550 CPU nodes • ~980 processors deployed in batch • Remainder are services nodes, servers etc. • 220TB disk space ~ 60 servers, ~120 arrays • Decommissioning • Majority of the P3/600MHz systems decommissioned Jan 05 • P3/1GHz systems to be decommissioned in July/Aug 05 after commissioning of Year 4 procurement. • Babar SUN systems decommissioned by end Feb 05 • CDF IBM systems decommissioned and sent to Oxford, Liverpool, Glasgow and London • Next procurement • 64bit AMD or Intel CPU nodes – power, cooling • Dual cores possibly too new • Infortrend Arrays / SATA disks / SCSI connect • Future • Evaluate new disk technologies, dual core CPUs, etc. Tier1 Status Report - HEPSysMan, RAL

Atlas DataStore • Evaluating new disk systems for staging cache • FC attached SATA arrays • Additional 4TB/server, 16TB total • Existing IBM/AIX servers • Tape drives • Two additional 9940B drives, FC attached • 1 for ADS, 1 for test CASTOR installation • Developments • Evaluating a test CASTOR installation • Stress testing ADS components to prepare for Service Challenges • Planning for a new robot • Considering next generation of tape drives • SC4 (2006) requires step in cache performance • Ancillary network rationalised Tier1 Status Report - HEPSysMan, RAL

Networking • Planned upgrades to Tier1 production network • Started November 04 • Based on Nortel 5510-48T `stacks’ for large groups of CPU and disk server nodes (up to 8/stack, 384 ports) • High speed backbone inter-unit interconnect (40Gb/s bi-directional) within stacks • Multiple 1Gb/s uplinks aggregated to form backbone • currently 2 x 1Gb/s, max 4 x 1Gb/s • Update to 10Gb/s uplinks and head node as cost falls • Uplink configuration with links to separate units within each stack and the head switch will provide resilience • Ancillary links (APCs, disk arrays) on separate network • Connected to UKLight for SC2 (c.f. later) • 2 x 1Gb/s links aggregated from Tier1 Tier1 Status Report - HEPSysMan, RAL

Batch Services • Worker node configuration based on traditional style batch workers with LCG configuration on top. • Running SL 3.0.3 with LCG 2_4_0 • Provisioning by PXE/Kickstart • YUM/Yumit, Yaim, Sure, Nagios, Ganglia… • All rack-mounted workers dual purpose, accessed via a single batch system PBS server (Torque). • Scheduler (MAUI) allocates resources for LCG, Babar and other experiments using Fair Share allocations from User Board. • Jobs able to spill into allocations for other experiments and from one `side’ to the other when spare capacity is available, to make best use of the capacity. • Some issues with jobs that use excess memory (memory leaks) not being killed by Maui or Torque – under investigation. Tier1 Status Report - HEPSysMan, RAL

Service Systems • Service systems migrated to SL 3 • Mail hub, NIS servers, UIs • Babar UIs configured as DNS triplet • NFS / data servers • Customised RH7.n  • Driver issues • NFS performance of SL 3 uninspiring c/w 7.n • dCache systems at SL 3 • LCG service nodes at SL 3, LCG-2_4_0 • Need to migrate to LCG-2_4_0 or loose work Tier1 Status Report - HEPSysMan, RAL

Storage • Moving to SRMs from NFS for data access • dCache successfully deployed in production • Used by CMS, ATLAS… • See talk by Derek Ross • Xrootd deployed in production • Used by Babar • Two `redirector’ systems handle requests • Selected by DNS pair • Hand off request to appropriate server • Reduces NFS load on disk servers  • Load issues with Objectivity server • Two additional servers being commissioned • Project to look at SL 4 for servers • 2.6 kernel, journaling file systems - ext3, XFS Tier1 Status Report - HEPSysMan, RAL

Service Challenges I • The Service Challenges are a program infrastructure trials designed to test the LCG fabric at increasing levels of stress/capacity in the run up to LHC operation. • SC2 – March/April 05: • Aim: T0->T1s aggregate of >500MB/s sustained for 2 weeks • 2Gb/sec link via UKlight to CERN • RAL sustained 80MB/sec for two weeks to dedicated (non-production) dCache • 11/13 gridftp servers • Limited by issues with network • Internal testing reached 3.5Gb/sec (~400MB/sec) aggregate disk to disk • Aggregate to 7 participating sites: ~650MB/sec • SC3 – July 05 -Tier1 expects: • CERN -> RAL at 150MB/s sustained for 1 month • T2s -> RAL (and RAL -> T2s?) at yet-to-be-defined rate • Lancaster, Imperial … • Some on UKlight, some via SJ4 • Production phase Sept-Dec 05 Tier1 Status Report - HEPSysMan, RAL

Service Challenges II • SC4 - April 06 • CERN-RAL T0-T1 expects 220MB/sec sustained for one month • RAL expects T2-T1 traffic at N x 100MB/sec simultaneously. • June 06 – Sept 06: production phase • Longer term: • There is some as yet undefined T1 -> T1 capacity needed. This could be add 50 to 100MB/sec. • CMS production will require 800MB/s combined and sustained from batch workers to the storage systems within the Tier1. • At some point there will be a sustained double rate test – 440MB/sec T0-T1 and whatever is then needed for T2-T1. • It is clear that the Tier1 will be able to keep a significant part of a 10Gb/sec link busy continuously, probably from late 2006. Tier1 Status Report - HEPSysMan, RAL

Security • The Badguys™ are out there • Users are vulnerable to loosing authentication data anywhere • Still some less than ideal practices • All local privilege escalation exploits must be treated as a high priority must-fix • Continuing program of locking down and hardening exposed services and systems • You can only be more secure • See talk by Roman Wartel Tier1 Status Report - HEPSysMan, RAL

Tier1 Status Report