150 likes | 288 Views
HEPiX/HEPNT report. Helge Meinhard, Alberto Pace, Denise Heagerty / CERN-IT Computing Seminar 05 November 2003. HEPiX/HEPNT Autumn 2003 (1). Held 20 – 24 October at TRIUMF, Vancouver Format: Mon – Wed Site reports, HEPiX and HEPNT talks Thu: Large Cluster SIG on security issues
E N D
HEPiX/HEPNT report Helge Meinhard, Alberto Pace, Denise Heagerty / CERN-IT Computing Seminar 05 November 2003
HEPiX/HEPNT Autumn 2003 (1) • Held 20 – 24 October at TRIUMF, Vancouver • Format: • Mon – Wed Site reports, HEPiX and HEPNT talks • Thu: Large Cluster SIG on security issues • Fri am: Parallel sessions on storage, security and Windows issues • Excellent organisation by Corrie Kost / TRIUMF • Weather not too tempting to skip sessions • Full details: http://www.triumf.ca/hepix2003/
HEPiX/HEPNT Autumn 2003 (2) • 76 participants, of which 11 from CERN • Barring, Durand, Heagerty, Iven, Kleinwort, Lopienski, Meinhard, Neilson, Pace, Silverman, D Smith • 59 talks, of which 19 from CERN • Vendor presence (Ibrix, Panasas, RedHat, Microsoft) • Friday pm: WestGrid • Next meetings: • Spring: May 24th to 28th in Edinburgh • Autumn: BNL expressing interest
Highlights • Unix-related (me) • Windows-related (Alberto Pace) • Security-related (Denise Heagerty)
Site reports: Hardware (1) • Major investments: Xeons, Solaris, IBM SP, Athlon MP • Disappointing experience with HT • Increasing interest: • Blades (e.g. WestGrid – 14 blades with 2 x Xeon 3.06 GHz each in 7U chassis) • AMD Opteron • US sites require cluster mgmt software with HW acquisitions
Site reports: Hardware (2) • Physical limits becoming ever more important • Floor space • UPS • Cooling power • Weight capacity per unit of floor space • Disk storage • Some reports of bad experience with IDE-based file servers • No clear tendency
Site reports: Software (1) • RedHat 6.x diminishing, but still in production use at many sites • Solaris 9 being rolled out • Multiple compilers needed on Linux (IN2P3: 6), but not considered a big problem • SLAC looking at Solarix/x86 • AFS not considered a problem at all • SLAC organising a ‘best practices’ workshop (complementing LISA and USENIX workshops) – see http://www.slac.stanford.edu/~alfw/OpenAFS_Best.pdf)
Site reports: Software (2) • NFS in use at large scale • Kerberos 5: No clear preference for MIT vs. Heimdal vs. Microsoft; lots of home bricolage around to have them synchronise • Reports about migrating out of Remedy • DESY and GSI happy with SuSE and Debian (except for laptops) • Condor getting more popular, considered as LSF replacement; Sun GridEngine mentioned as well
CERN talks • Castor evolution (Durand) • Fabric mgmt tools (Kleinwort) • CVS status and tools (Lopienski) • Solaris service update (Lopienski) • Console management (Meinhard) • ADC tests and benchmarks (Iven) • New HEPiX scripts (Iven) • LCG deployment status and issues (Neilson) • LCG scalability issues (D Smith) • Windows and/or security related
RedHat support (1) • Tue: Talk by Don Langley/RedHat • Described new model, and technical features of RHEL 3 released the day after • RHEL releases every 12…18 months, with guaranteed support for 5 years • Yearly subscriptions (per machine) grant access to sources, binaries, updates, and to support (different levels) • Said that RedHat would be able to find the right model for HEP • Reactions: Not everyone was convinced, no clear commitment to react to our needs, not the right level
RedHat support (2) • Wed: Interactive discussion • Labs currently using RH wish to stay and go for RHEL; HEP-wide agreement preferred • High level of HEP-internal Linux support must be taken into account by RH • HEP- or site-wide licences much preferred over per-node • SLAC, FNAL and CERN contact RedHat in common in order to negotiate for HEP • Other HEP sites should be able to join if they so wish
Other highlights (1) • PDSF Host Database project (Shane Canon) • Inventory mgmt, purchase information, technical details, connectivity, … • Similar objectives to some combination of BIS/CDB, HMS, LanDB, … • Unix and AFS backup at FNAL (Jack Schmidt) • Investigated TSM, Veritas, Amanda, some smaller vendors • Decided to go for TiBS (True incremental Backup System - Carnegie-Mellon offspring) – 1.6 TB in 5 hours • Large disk cache of backup data on server
Other highlights (2) • Mosix and PBS clustering at TRIUMF (Steve Mcdonald) • Challenge: provide interactive and batch services with little budget • 7 dual-proc systems, three running OpenMosix all the time (one of them serving as a head node), rest running OpenPBS if jobs, migrating to Mosix if no jobs
Mass storage workshop • A meta-meeting… • Discussed what and how to discuss • Joining in by VRVS: FNAL, RAL, IN2P3, DESY, FZK, … • Launched forum for MSS and their interoperability • E-mail list: hep-forum-mss@cern.ch • Each site to report (to Don Petravick/FNAL) about capabilities and needs conerning WAN interfaces, security, monitoring and protocols, file transfer protocols, mgmt protocols, replica system • Next meeting: VRVS conference in December • Next HEPiX: LCSIG will be on storage
My personal comments • Excellent means of building relationships of trust between centres • No impression of cheating by anybody • Clear concrete steps towards sharing tools… • LAL using CERN printer wizard • CERN using SLAC console software • A lot of interest in ELFms • … and even when not sharing implementations, sharing ideas and information is very beneficial