120 likes | 269 Views
Summary of the HEPiX Spring 2013 Meeting. Arne Wiebalck Luca Mascetti Luis Fernandez Alvarez CERN ITTF May 17, 2013. HEPiX – www.hepix.org. Global organization of service managers and support staff providing computing facilities for HEP community
E N D
Summary of the HEPiX Spring 2013 Meeting Arne Wiebalck Luca Mascetti Luis Fernandez Alvarez CERN ITTF May 17, 2013
HEPiX – www.hepix.org • Global organization of service managers and support staff providing computing facilities for HEP community • Participating sites include BNL, CERN, DESY,FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF … • Meetings are held twice per year • Spring: Europe, Autumn: U.S./Asia • Exchange of experiences, reports on recent work,work in progress & future plans • Usually no showing-off Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 2
Outline • Miscellaneous, Site reports, Storage (Arne) • IT infrastructure, Computing (Luca) • Virtualization, Networking & Security (Luis) Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 3
HEPiX Spring 2013 • May 15-19 at CNAF, INFN, Bologna (IT) • Very well organized, pretty rich program • Network access: eduroam (Thanks to CS for last minute support!) • 83 registered participants • Administrative hurdles (& illnesses) prevented better participation • Europe: 69, U.S./Canada: 8, Asia: 5, Australia: 1 (CERN: 15) • ~70 presentations from 40 institutes • 3 BoF sessions (OpenAFS/IPv6, CMDBuild, Energy efficiency) • Many offline discussions • Sponsors: WD, DDN, IBM, E4, and Univa Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 4
Next HEPiX Meetings • Autumn 2013 • U Michigan, Ann Arbor, MI, U.S. • Oct 28 – Nov 1, 2013 • Spring 2014 • LAPP, Annecy, France • May 19 – May 23, 2014 • Autumn 2014 • several options, not yet decided Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 5
Updates from the WGs • IPv6 • IPv4 address shortage becoming a serious issue soon • distributed testbed has been set up, more and more sitesjoining, constant testing (file transfer) • Tools & Software Survey, “problematic” applications identified • http://indico.cern.ch/contributionDisplay.py?contribId=35&sessionId=2&confId=220443 • Storage • WG terminated • Summary report at Ann Arbor meeting • Benchmarking • No new SPEC benchmark • Application/benchmark discrepancies become worrying(used for purchases) • Configuration Management • New WG led by Ben Jones (CERN) and Yves Kemp (DESY) Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 6
Some Trends • Batch system reviews everywhere • BNL, CERN, GridKA, NERSC, … • Univa GridEngine seems to take the lead • WNs with HT • Broad use of cloud services & virtualization • Private clouds almost everywhere (mostly OpenStack) • Idle VM detection (FNAL), EC2 spot pricing (BNL) • Puppet taking the lead for configuration mgmt • But: no monoculture expected • Interest in Ceph for VM storage • ASGC, BNL, CERN, RAL, … • At an early stage everywhere Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 7
Site Reports (1) • Storage/File Systems • Lustre sites happy, GSI: 8PB, home-made access control • NFS on BlueArc (BNL: almost 1PB of disk space, home+scratch) • GlusterFS mentioned once • Tape • Mostly Sun SL8500s, some IBM, but also Spectra T-Finity (UiO) • FNAL encountered excessive write errors on new tapes:Contaminated with debris during manufacturingSolution: f/w upgrade and change of manufacturing process • Tape access optimization: BNL’s developed tape scheduler in HPSS • Authentication • FNAL looking into consolidation of authentication setup:MIT Kerberos + CA, two separate AD domainsPlan to be presented a next HEPiX Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 8
Site Reports (2) • Software • SL5 still the mostly used OS for compute clusters (move to SL6 planned) • BNL successfully uses ORACLE/ksplice (rebootless kernel patching) since about 2 years on their production clusters • Hardware • Dell systems dominate (PowerEdge R410, R510, R720, C6220, MD3260…)Not only in the U.S. • Infrastructure • NERSC computing facilities will be relocated from Oakland to the new CRT building in BerkeleyFirst systems will move 1Q2015, last will stay until 4Q2016 • Networking • Jumbo frames on LAN are being tried at several sites Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 9
Storage (1) • Track dominated by CERN presentations (7/11) • Mostly reported already on previous ITTF presentations(AFS, CASTOR/EOS, RAID optimizations), or future ones (Ceph) • DPHEP initiative and its impact for HEPiX • Long-term data management a site responsibility • Techniques and policies need cross-site coordination • BoF Session on “OpenAFS & IPv6” • Many sites regard AFS as one of their core services, value itsrobustness and plan to continue using it in the future • Various options to deal with the IPv6 situation were discussed,but not the lack of support is not regarded as a burning issue(at least right now) • The need to gather more information was identified (use cases,traffic maps, prices for an implementation, …) to take an informed decision (before or at next HEPiX) • Peter van der Reest (DESY) and Arne Wiebalck (CERN) to follow up Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 10
Storage (2) • Storage Architecture at CNAF Tier1 • 11PB on disk; 16PB on tape • >10k processes at 20GB/s (LAN) • Few but big, dedicated, replicated storage systems • GPFS + TSM • Whole stack (DDN storage backend nodes, I/O servers,metadata servers, gridFTP servers, StoRM servers,HSM servers) replicated for each experiment • Manageability problems • Huge building blocks (compared to yearly growth) • Small config changes (can) affect performance • Storage re-balancing takes effort and (can) affect performance • “Slow disk” problem: faulty disks (can) affect performance • Evaluating alternatives • Multiport SAS arrays with s/w-RAID? • RAIN (simple EOS-like replication regarded as too expensive)? • EMC Isilon (NAS w/ IB interconnect) under investigation Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 11
Questions? Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 12