130 likes | 150 Views
HEPiX Spring 2008 @ CERN -Summary Report. HEPSysMan @ RAL 19-20 June 2008 Martin Bly. Overview. Venue/Format/Themes CPU Benchmarking Working Group Storage and File Systems Working Group Scientific Linux Selected topics. Spring HEPiX 2008. Venue: CERN - 5 th to 9 th May
E N D
HEPiX Spring 2008 @ CERN -Summary Report HEPSysMan @ RAL 19-20 June 2008 Martin Bly
Overview • Venue/Format/Themes • CPU Benchmarking Working Group • Storage and File Systems Working Group • Scientific Linux • Selected topics HEPiX Spring 2008 Report - HEPSysMan @ RAL
Spring HEPiX 2008 • Venue: CERN - 5th to 9th May • Council Chamber • Very comfortable, good wireless network access • Format • Sessions based on themes with a morning ‘plenary’ by an invited speaker • ½ to 1 day per theme • Agenda: http://indico.cern.ch/conferenceTimeTable.py?confId=27391 HEPiX Spring 2008 Report - HEPSysMan @ RAL
Themes • LHC and Data Readiness • LHC overview • Trigger farms of LHC experiments • LCG overview and status • CCRC • Site Reports • Storage technology • CPU technology • Data centre management, availability, and reliability • Problem resolution, problem tracking, alarm systems • System management • Networking infrastructure and computer security • Applications and Operating systems • HEPiX ‘bazaar and think-tank’ • General Virtualisation • Grid stuff (Monitoring etc.) • Miscellaneous HEPiX Spring 2008 Report - HEPSysMan @ RAL
Benchmarking Working Group • WLCG MoUs based on SI2K • SPEC2000 deprecated in favour of SPEC2006 • no longer available and maintained • Remit • Find a benchmark accepted by HEP and others as many sites serve different communities • Review of existing benchmarking practices (CERN, FZK, INFN, …) • Last 6 months: setup of benchmarking test-bed with dedicated HW at CERN and others • Covering of wide range of processors with typical HEP configuration (2GB/core) • Run SPEC benchmarks with agreed flags • SL4/64bit OS with benchmarks at 32-bit/gcc 3.4 • Look at SL5, 64-bit, gcc4 • Run of variety of ‘standard candles’ from LHC experiment’s code to compare with SPEC • Provides scaling and recalibration of computi9ng requirement • Looking at understanding the statistical treatment of experiment results • Recently uncovered different methodologies for random numbers! • No major scaling problem with either SI2K or SI2K6 • Should allow a smooth transition HEPiX Spring 2008 Report - HEPSysMan @ RAL
File Systems Working Group • Started with a questionnaire about storage at T1s • Followed up with a technology review and selection • Posix FS (TFA) : LUSTRE, GPFS, AFS • SRM : CASTOR, dCache, DPM • Xrootd • Performance comparison between selected technologies • Testbed setup at CERN with 10 servers and 60 8-core clients with 1 Gb/s connection, 4-5 6TB • 480 simultaneous client tasks • 3 tests : writing, sequential read, pseudo-random read • Most implementations able to sustain wire-speed in writes and sequential reads • Significant performance advantage for LUSTRE in pseudo-random reads but must clarify test conditions • Use case may be an advantage for LUSTRE client-side caching HEPiX Spring 2008 Report - HEPSysMan @ RAL
Scientific Linux • Review of recent releases: SL5.1, SL4.6 • Trying to put the 64bit versions out at the same time as the 32bit versions • Obsolete 3.0.1 to 3.0.8 • Description of issue with ‘new’ tags in version numbers appearing to make new versions appear older to yum • Working on automating ‘fastbugs’ repositories • Clarifying policy on security errata • Future: • SL3.0.9 to continue till October 2010. • Planning on doing SL4.7, SL5.2, SL6. HEPiX Spring 2008 Report - HEPSysMan @ RAL
SL discussions • Support for SL4? • RHEL4: full support 3 years, deployment support 3-5 years, maintenance support for 5-7 years. • RHEL released Feb 2005 so in deployment support • Critical that Grid middleware is available • DESY need SL4 to Autumn 2011 • CERN intending to introduce batch and UIs for SL5 in Autumn 2008, so WN gLite payload should be available • Some concern over experiment readiness • Compiler is the important factor rather than the actual version of SL • Encourage shorter deadlines with more flexibility on extending deadlines – likely to get better buy-in from users • So suggest July 2010? Suggestion of October 2010, same as SL3, to stop short-term migration. • XFS in SL? • In or out? Consensus is to have it in using the usual kernel module system. Jan Iven hears from unreliable source that back-ports of latest version are coming. SL4 or SL5? SL4 contrib, SL5 standard. Does it work with 32bit? Yes, kernel now less hostile. • Scientific Linux 6: Should it be based on CentOS? • Still do installer changes • Still add RPMs we usually do • Use precompiled RPMs • Change/recompile RPMs we feel the need to (SL graphics). • Kernels modules: Adding security repo during the install gets the correct kernel but incorrect modules. Can fix installer, fix up afterwards with a script, or use dkms. Add dkms to release, do it instead of kernel modules? • Stop Press: RHEL 4 lifetime extended: ‘full support’ for 4 years… HEPiX Spring 2008 Report - HEPSysMan @ RAL
Selected Topics I • Well attended talk by Sascha Brawer from Google, describing their technology and methods for handling very large datasets over distributed geographical locations • Based on truckloads of low cost systems • Care about performance per $ not raw performance • In house rack design, chassis-less PC-class motherboards, low end storage • Many data centres around the world • Need to design software to cope with failures HEPiX Spring 2008 Report - HEPSysMan @ RAL
Selected Topics II • Several talks on experiences with Lustre • DESY – good description of setting it up • GSI – talk about production use • Lustre appears stable and reliable as a production distributed file system • Proof against various failure modes • Sverre Jarp gave a review of the CERN OpenLab and what they are working on • Collaboration with HP, Intel, Oracle… HEPiX Spring 2008 Report - HEPSysMan @ RAL
Experience with Windows Vista at CERN • Update on Vista activities at CERN • status, plans etc. • Using readiness check to determine suitability, Vista not the default (XP). • Now 300 machines (~5%) running Vista. • Notes on introduction of SP1 • Feb 2008: still preparing for the upgrade rollout. RFM removed in favour of popup nagging. • Vista SP1 improved performance over XP or standard vista, but not by much in most cases. HEPiX Spring 2008 Report - HEPSysMan @ RAL
Virtualisation with Windows at CERN • Review of virtualisation in IT services at CERN • 17 physical servers with 45 ‘clients’ ranging through Windows server variants and SLC4/5 • Using Virtual Server 2005 • New Hyper-V – part of Windows Server 2008, needs 64bit CPU • Supports 32/64bit guests, large RAM (>32GB) in VMs HEPiX Spring 2008 Report - HEPSysMan @ RAL
Remote Administration via Service Modules • Work at GSI on using IPMI modules to administer remotely located server hardware • Disadvantages of remote access using standard tools, not the least of which is you need a running OS. • Discussion of advantages of using IPMI modules for remote control • changing BIOS settings, resets, installing… • Detailed description of capabilities. HEPiX Spring 2008 Report - HEPSysMan @ RAL