100 likes | 217 Views
CASPUR Site Report. Andrei Maslennikov Sector Leader - Systems Orsay, April 2001. Will be shortly covered:. Central computers Storage Tape-related systems Central services AFS, SSH... External support CASPUR and HEP Projects for year 2001. Central computers.
E N D
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Orsay, April 2001
Will be shortly covered: • Central computers • Storage • Tape-related systems • Central services • AFS, SSH... • External support • CASPUR and HEP • Projects for year 2001 A.Maslennikov - Orsay 2001
Central computers • Sun SMP - 3500/4500 - 22 processors - Solaris 7+ • - Parallel batch (GRD/Codine) on 8 (336Mhz/2Gb) and 14 (336Mhz/3.6GB) CPUs • - Not very popular, load < 40% • Alpha SMP - 4100/ES40 - 48 processors - DU 4.0F+ • - Front-end : 4 CPU x 500Mhz/2Gb + 2 x 532Mhz/1Gb • - Parallel batch (GRD/Codine) : 20 CPU x 500Mhz/2Gb + 20 x 400Mhz/1Gb • - Serial batch (GRD/Codine) : 2 CPU x 667Mhz/1Gb • - Systems very stable since 1 year • - Batch load: 100% • IBM SMP - Power3 - 72 processors - AIX 4.3.3 ML8/ PSSP 3.2+ • - Front-end : 4 CPU x 375Mhz/4GB • - Parallel batch (GRD/Codine) : 64 CPU x 375Mhz/16Gb - 4 nodes on Colony Switch • - Serial batch : 4 CPU (2 x 375/ 2 x 200 Mhz) • - Batch load: 100% A.Maslennikov - Orsay 2001
Storage • Scratch Areas • - Local RAID-0 scratch areas on alphas and Sun (30-40 GB per node) • - IBM 2102 FC Storage Server for SP3 with 560 GB - GPFS • Large-File Network Data Areas (NFS) • - Two Network Appliance Filers: F540(150GB/FE) and F760(600GB/GE) • - F540: mainly used for tape staging and as a temp space on the LAN - is being dismissed • - F760: dedicated for number-crunching nodes and is not saturated - 1 TB is being added • Small-File Network Data Areas (AFS) • - some 1.4 TB of Dothill RAID-5 on switched Fibre Channel SAN • - 4 servers on CASPUR LAN ( 4x Sun UltraSparc 440 ) • - 70 GB of commodity disk on WAN (Bari and Lecce, 2x Intel/Linux RH 7.0++/OAFS) • NAS and SAN • - NAS: F760, AFS servers <=> Stager and main number-crunchers : being migrated to GE • - SAN: 48 Brocade ports, 10 hosts, 6 disk systems, 6 tape drives : grows fast • - In discussion: tests of 4 Dothill 7120 systems of 0.7TB each as GPFS core on SP3 A.Maslennikov - Orsay 2001
Tape-related systems • Tape Drives and Robotics • - 9740 STK Library (494 slots) is no longer sufficient to host both CASPUR and BABAR data • - Just acquired a new LTO/FC 3584 system from IBM - with 300 slots and 4 drives • - Choice influenced also by the work done at CERN (Baud,Collin,Curran): • http://cscct.home.cern.ch/cscct/ultrium/index.htm • - LTO: excellent streaming speeds - measured 15MB/sec native • - LTO: positioning slow (av 100 sec vs 15 sec for 9840), IBM is working on it • - LTO: 1 drive costs 1/4th of 9840, 1/?th Of 9940. • - Currently on FC: 9840(bridged), DLT7000(bridged), 4xLTO(native) • Tape services • - Automated ADSM backup for some 20 service hosts and Windows desktops • - Automated AFS backup • - Tape locking via Tape Dispatcher • - Staging Servers for CASPUR and BABAR (since 1998): • o fully portable (perl+mysql) • o redundant data format • o multitape supported • o users handle only file names A.Maslennikov - Orsay 2001
Central services • All our central services are Linux-based: • - Syscontrol, DNS, Web, Mail, License, Print, Remote Access, DB • - Linux system tree - always up-to-date • - CASPUR BigBox CD with OAFS and SSH/OAFS - always fresh • During last year: • - Moved to uniform hardware: rack-mounted systems (VA Linux) • - System disks of Syscontrol and DB hosts on Mylex DAC960 CTL (RAID-5) • - Mail: implemented commercial HA solution from Steel Eye (LifeKeeper): • o redundant heartbeat (serial and ethernet) • o RAID-5 spool and sw on low-end Infortrend CTL with 2 host channels • o ping mail while halting the current host: only 5 packets lost A.Maslennikov - Orsay 2001
AFS, SSH... • AFS • - OAFS a marvel (cheap servers possible). • - Free enhanced OAFS client RPMs available at: /afs/caspur.it/project/openafs • - Badly missing: AFS port for COMPAQ Tru64 5.1. • Transarc does nothing • OAFS port may be done at KTH. Now trying to help them to get the OS source • Anybody else interested? • - Maintenance contract: IBM cannot make an offer for more than a year. We receive • support free-of-charge, but hopefully it will end up soon. • SSH • - 1.2.x dangerous • - migrated urgently to openssh 2.3.0p1 (AFS-aware with direct authentication and watcher), • on all architectures A.Maslennikov - Orsay 2001
External support • Turnkey departmental solution • - OAFS Cell on Linux • automated backup required (DLT or AIT autoloaders) • redundant disk when possible • user and space management tools • Clients of UNIX and Windows, MAC AFS Gateway • Server normally stuffed with many other services: web,mail,dns, nis etc • - Organization of work • local trained person per cluster a must • no-root-pw a must • remote-only support (notification mainly via e-mail) • max 20% of total FTE resources dedicated (mainly for initial set-up) • Outside CASPUR: • - 7 Clusters (number 8 just ordered) with about 80 nodes • - Some 20 stand-alone machines (we are getting rid of these) • - All kinds of hardware and all flavours of UNIX A.Maslennikov - Orsay 2001
CASPUR and HEP • Everyday works for INFN: • - Fullscale AFS system support (maintenance and hotline) • - ASIS mirroring to 18 INFN Sections • - SSH tree maintenance • - Linux tree maintenance incl. bootable CDs at the latest patchlevel • BABAR Cluster at CASPUR • - 5 E450 Sun Servers with 6TB of disk (Sun, COMPAQ, DotHill) • - Linux/OAFS file server with backup • - 10-host Sun MC Farm (Ultra 5) • - 14-host Intel/Linux MC farm (rackmounted + 1 TB of RAID IDE) • - multitape stager on E450 - 2 STK 9840 drives • - GRD/Codine on all nodes • Other • - Regular exchanges with CERN • - Virgo (software) A.Maslennikov - Orsay 2001
Projects for year 2001 • Syscontrol DB • - mysql now, migration to InterBase by the end of 2001 • - Hosts’ DB and Syslog event collector DB • - Hooks for syscontrol applications • Control and Monitoring • - agent up and running on all Linux hosts • - being ported to other architectures (encryption) • - server integration with Syscontrol DB(event logs and configuration) • Problem management • - currently study possible solutions, Razor is one of the options • Console Server • - planned for the second half of 2001 • - currently look at the serial hardware • Security • - accent on host-based • - host security “index” is being developed to integrate with Syscontrol A.Maslennikov - Orsay 2001