1 / 13

HEPiX Spring 2008 @ CERN -Summary Report

HEPiX Spring 2008 @ CERN -Summary Report. HEPSysMan @ RAL 19-20 June 2008 Martin Bly. Overview. Venue/Format/Themes CPU Benchmarking Working Group Storage and File Systems Working Group Scientific Linux Selected topics. Spring HEPiX 2008. Venue: CERN - 5 th to 9 th May

orinda
Download Presentation

HEPiX Spring 2008 @ CERN -Summary Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HEPiX Spring 2008 @ CERN -Summary Report HEPSysMan @ RAL 19-20 June 2008 Martin Bly

  2. Overview • Venue/Format/Themes • CPU Benchmarking Working Group • Storage and File Systems Working Group • Scientific Linux • Selected topics HEPiX Spring 2008 Report - HEPSysMan @ RAL

  3. Spring HEPiX 2008 • Venue: CERN - 5th to 9th May • Council Chamber • Very comfortable, good wireless network access • Format • Sessions based on themes with a morning ‘plenary’ by an invited speaker • ½ to 1 day per theme • Agenda: http://indico.cern.ch/conferenceTimeTable.py?confId=27391 HEPiX Spring 2008 Report - HEPSysMan @ RAL

  4. Themes • LHC and Data Readiness • LHC overview • Trigger farms of LHC experiments • LCG overview and status • CCRC • Site Reports • Storage technology • CPU technology • Data centre management, availability, and reliability • Problem resolution, problem tracking, alarm systems • System management • Networking infrastructure and computer security • Applications and Operating systems • HEPiX ‘bazaar and think-tank’ • General Virtualisation • Grid stuff (Monitoring etc.) • Miscellaneous HEPiX Spring 2008 Report - HEPSysMan @ RAL

  5. Benchmarking Working Group • WLCG MoUs based on SI2K • SPEC2000 deprecated in favour of SPEC2006 • no longer available and maintained • Remit • Find a benchmark accepted by HEP and others as many sites serve different communities • Review of existing benchmarking practices (CERN, FZK, INFN, …) • Last 6 months: setup of benchmarking test-bed with dedicated HW at CERN and others • Covering of wide range of processors with typical HEP configuration (2GB/core) • Run SPEC benchmarks with agreed flags • SL4/64bit OS with benchmarks at 32-bit/gcc 3.4 • Look at SL5, 64-bit, gcc4 • Run of variety of ‘standard candles’ from LHC experiment’s code to compare with SPEC • Provides scaling and recalibration of computi9ng requirement • Looking at understanding the statistical treatment of experiment results • Recently uncovered different methodologies for random numbers! • No major scaling problem with either SI2K or SI2K6 • Should allow a smooth transition HEPiX Spring 2008 Report - HEPSysMan @ RAL

  6. File Systems Working Group • Started with a questionnaire about storage at T1s • Followed up with a technology review and selection • Posix FS (TFA) : LUSTRE, GPFS, AFS • SRM : CASTOR, dCache, DPM • Xrootd • Performance comparison between selected technologies • Testbed setup at CERN with 10 servers and 60 8-core clients with 1 Gb/s connection, 4-5 6TB • 480 simultaneous client tasks • 3 tests : writing, sequential read, pseudo-random read • Most implementations able to sustain wire-speed in writes and sequential reads • Significant performance advantage for LUSTRE in pseudo-random reads but must clarify test conditions • Use case may be an advantage for LUSTRE client-side caching HEPiX Spring 2008 Report - HEPSysMan @ RAL

  7. Scientific Linux • Review of recent releases: SL5.1, SL4.6 • Trying to put the 64bit versions out at the same time as the 32bit versions • Obsolete 3.0.1 to 3.0.8 • Description of issue with ‘new’ tags in version numbers appearing to make new versions appear older to yum • Working on automating ‘fastbugs’ repositories • Clarifying policy on security errata • Future: • SL3.0.9 to continue till October 2010. • Planning on doing SL4.7, SL5.2, SL6. HEPiX Spring 2008 Report - HEPSysMan @ RAL

  8. SL discussions • Support for SL4? • RHEL4: full support 3 years, deployment support 3-5 years, maintenance support for 5-7 years. • RHEL released Feb 2005 so in deployment support • Critical that Grid middleware is available • DESY need SL4 to Autumn 2011 • CERN intending to introduce batch and UIs for SL5 in Autumn 2008, so WN gLite payload should be available • Some concern over experiment readiness • Compiler is the important factor rather than the actual version of SL • Encourage shorter deadlines with more flexibility on extending deadlines – likely to get better buy-in from users • So suggest July 2010? Suggestion of October 2010, same as SL3, to stop short-term migration. • XFS in SL? • In or out? Consensus is to have it in using the usual kernel module system. Jan Iven hears from unreliable source that back-ports of latest version are coming. SL4 or SL5? SL4 contrib, SL5 standard. Does it work with 32bit? Yes, kernel now less hostile. • Scientific Linux 6: Should it be based on CentOS? • Still do installer changes • Still add RPMs we usually do • Use precompiled RPMs • Change/recompile RPMs we feel the need to (SL graphics). • Kernels modules: Adding security repo during the install gets the correct kernel but incorrect modules. Can fix installer, fix up afterwards with a script, or use dkms. Add dkms to release, do it instead of kernel modules? • Stop Press: RHEL 4 lifetime extended: ‘full support’ for 4 years… HEPiX Spring 2008 Report - HEPSysMan @ RAL

  9. Selected Topics I • Well attended talk by Sascha Brawer from Google, describing their technology and methods for handling very large datasets over distributed geographical locations • Based on truckloads of low cost systems • Care about performance per $ not raw performance • In house rack design, chassis-less PC-class motherboards, low end storage • Many data centres around the world • Need to design software to cope with failures HEPiX Spring 2008 Report - HEPSysMan @ RAL

  10. Selected Topics II • Several talks on experiences with Lustre • DESY – good description of setting it up • GSI – talk about production use • Lustre appears stable and reliable as a production distributed file system • Proof against various failure modes • Sverre Jarp gave a review of the CERN OpenLab and what they are working on • Collaboration with HP, Intel, Oracle… HEPiX Spring 2008 Report - HEPSysMan @ RAL

  11. Experience with Windows Vista at CERN • Update on Vista activities at CERN • status, plans etc. • Using readiness check to determine suitability, Vista not the default (XP). • Now 300 machines (~5%) running Vista. • Notes on introduction of SP1 • Feb 2008: still preparing for the upgrade rollout. RFM removed in favour of popup nagging. • Vista SP1 improved performance over XP or standard vista, but not by much in most cases. HEPiX Spring 2008 Report - HEPSysMan @ RAL

  12. Virtualisation with Windows at CERN • Review of virtualisation in IT services at CERN • 17 physical servers with 45 ‘clients’ ranging through Windows server variants and SLC4/5 • Using Virtual Server 2005 • New Hyper-V – part of Windows Server 2008, needs 64bit CPU • Supports 32/64bit guests, large RAM (>32GB) in VMs HEPiX Spring 2008 Report - HEPSysMan @ RAL

  13. Remote Administration via Service Modules • Work at GSI on using IPMI modules to administer remotely located server hardware • Disadvantages of remote access using standard tools, not the least of which is you need a running OS. • Discussion of advantages of using IPMI modules for remote control • changing BIOS settings, resets, installing… • Detailed description of capabilities. HEPiX Spring 2008 Report - HEPSysMan @ RAL

More Related