90 likes | 197 Views
Jefferson Lab Site Report. Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770 http://cc.jlab.org HEPiX - TRIUMF, Oct. 20, 2003. Central Computing. Sun systems Upgrade to Solaris 8 almost complete HP systems
E N D
Jefferson LabSite Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770 http://cc.jlab.org HEPiX - TRIUMF, Oct. 20, 2003
Central Computing • Sun systems • Upgrade to Solaris 8 almost complete • HP systems • All upgraded to HP 11i • Moving away from HP for central services • Linux systems • Still at RedHat 7.2 • Evaluating RedHat 10 (Fedora 1) • Windows 2000 Domain Upgrade • Implemented in May • Working on Group policy issues
Central Computing (cont) • Network Appliance • 2 recently upgraded to the FAS940 (~16k NFS Ops/sec) • ~4.5TB online disk space (1.5TB home, 2TB group) • Linux fileserver • 3Ware SATA system • 2TB scratch area (16 160GB Seagate SATA drives) • Backups • QuickRestore • Seagate LTOs, Overland Tape Library
Scientific Computing • JASMine & Auger (http://cc.jlab.org/scicomp) • JASMine: Mass Storage Tape + Disk Cache • Auger: Batch Farm Management & Monitoring • Typical Day • 2 – 4 TB of INPUT data through the farm • Process 2000 – 5000 jobs • Certificates used for all user authentication • Tape drives • 6 9840s – migrating data to 9940Bs • 13 9940A – Read only • 15 9940B – all data written to these tapes
Scientific Computing (cont) • Linux File Servers • 16 Data Movers – • 10 Mylex eXtremeRAID 2000 RAID cards (RAID-5) (SCSI) • 6 Adaptec 2200S Raid Cards (RAID-50) (U320 SCSI) • 32 Cache/Work File Servers • Mixture of Mylex and 3Ware cards • Batch Farming – over 24000 SPECint95, LSF • 178 RH 7.2 Linux dual-processors (P2 750 to P4 2.66GHz)
Noteworthy • Kswapd failures -- Solved • Automount timeouts set to 60 seconds, NOT minutes • Adaptec 2200S raid cards • Instead of the MegaRaid cards • Not quite as fast, but acceptable • Timeout problem -- fix available • Adaptec TOE (TCP Offloading Engine) • Problems with RH7.2, custom kernel (XFS), and their driver • Anyone else using them? Good results?
Projects • Windows • Standard builds (Server, IIS, desktop, laptop) • Backup Software Upgrade • Reliaty (was QuickRestore) • SSH v2 Internally • Networks • Gigabit connection to our border router • VLans for use on site
Projects (cont) • JASMine • Rewrite disk cache • Support farm output caches • Policy-based file movement off-site • Auger • Better file scheduling/pinning
Projects (cont) • PPDG • SRM version 2 • Replication • Replica Catalog web service interface • Remote Job submission • User and System JDLs • Batch web service integration with Auger