130 likes | 318 Views
Summary of the HEPiX Autumn 2013 Meeting. Arne Wiebalck Afroditi Xafi Thomas Oulevey CERN ITTF November 22, 2013. Outline. Miscellaneous Site reports Storage Basic IT Services Computing & Batch Systems IT facilities End User Services Clouds & Virtualisation Networking & Security.
E N D
Summary of the HEPiX Autumn 2013 Meeting Arne Wiebalck Afroditi Xafi Thomas Oulevey CERN ITTF November 22, 2013
Outline • Miscellaneous • Site reports • Storage • Basic IT Services • Computing & Batch Systems • IT facilities • End User Services • Clouds & Virtualisation • Networking & Security Arne Afroditi Thomas Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 2
HEPiX – www.hepix.org • Global organization of service managers and support staff providing computing facilities for HEP community • Participating sites include BNL, CERN, DESY,FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF … • Meetings are held twice per year • Spring: Europe, Autumn: U.S./Asia • Exchange of experiences, reports on recent work,work in progress & future plans • Usually no showing-off Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 3
Next HEPiX Meetings • Spring 2014 • LAPP, Annecy, France • May 19 – May 23, 2014 • Autumn 2014 • University of Nebraska (NE), U.S. • Final approval needed, dates to be determined • Spring 2015 • U.K. discussed as an option Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 4
HEPiX Autumn 2013 • Oct 28 - Nov 1 at U Michigan, Ann Arbor (MI) • Very well organized, pretty rich program • Network access: eduroam (as in Bologna) • 115 (!) registered participants • Europe: 48, U.S./Canada: 47, Asia: 3, Australia: 2 (CERN: 13) • Many first timers, several North-American WLCG Tier-2 Univ.’s • DoE labs could mostly participate, only few cancellations (ZFS) • 15 participants from 9 companies • 65 presentations from 35 institutes • 26 hours of presentations • Many offline discussions • Sponsors: WD, UMICH, DDN, NetApp, and Univa Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 5
Updates from the WGs (1) • Storage • WG terminated, no summary as Andrei could not participate • Batch • WG terminated, updates to Wiki will continue • IPv6 • Big ISPs move to IPv6 (CH: >10% of Google traffic already via IPv6) • CERN seems well prepared, some smaller labs have not even started • IPv6 support in batch systems? • A lot of testing ongoing, including the experiments, test bed growing • https://indico.cern.ch/getFile.py/access?contribId=26&sessionId=2&resId=1&materialId=slides&confId=247864 Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 6
Updates from the WGs (2) • Benchmarking • New SPEC CPU benchmark suite planned for Oct 2014 • Plan is to start working with the experiments early (to identify apps to validate) • Bit preservation • New working group led by CERN (German Cancio) and DESY (Dimitry Ozerov) • Follow-up on DPHEP presentation from J. Shiers during Bologna meeting • Focus on technical advice on bit preservation • https://indico.cern.ch/getFile.py/access?contribId=45&sessionId=3&resId=1&materialId=slides&confId=247864 • Configuration Management • No update (chairs could not participate) • Energy efficiency • On hold for now, little feedback, no interest or no resources? • To be re-discussed in Annecy Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 7
Site reports (1) • Configuration Management (Puppet) “hot topic” • Sites come from Rocks, Quattor, home-grown scripts, … • Interesting: master-less Puppet at FNAL • Other sites discuss similar topics as we do (workflow, secrets, …) • Little synergy in the community so far, WG activity needed! • Batch system reviews ongoing • Univa GridEngine & HTCondor take the lead(SLURM did not survive testing at various sites) • IPv6 and job authentication remain open issues • Broad use of cloud services & virtualization • Clouds move into production everywhere • Complete virtualization of services (e.g. AFS at UMICH) Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 8
Site reports (2) • “Dropbox”-like service at GridKA • For 55’000 users from several universities (10GB quota) • Powerfolder was picked as their solution • Lustre/Hadoop established at various sites • Lustre: GSI (10PB), IHEP (3PB), FNAL (0.2PB), JLAB, … • Hadoop: smaller sites, PB installations • Interest in & investigations around Ceph • Mostly for OpenStack VMs, but also other usage cases (RBD),backend for dCache, NFS replacement, CASTOR complement … • Most sites still at an early stage Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 9
Site reports (3) • Scientific Linux 6 • Many sites finished migration (of batch) to SL6: RAL, GridKA, INFN, … • Significantly improved performance on older systems Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 10
Storage (1) • dCache update • Support for v4.1/pNFS currently being tested (looks OK) • xroot and HTTP/WebDAV federations • Backend testing (DDN, Ceph) • Summary of FNAL USCMS T1 storage investigation • Seeking solutions for online (2GB, POSIX) and nearline (1TB w/ tape) • Currently on BlueArc & dCache & Lustre & EOS • Goal: consolidation of storage solutions • Evaluated: the current systems plus NetApp, GPFS, Nexsan, SnapScale • Result: dCache for T1 production, EOS for LPC analysis, HNAS for home Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 11
Storage (2) • Western Digital on disk drive technology • Giving insights on difficulties when doing macroscopic mechanics on nano-scale • Platter ‘non-flatness’ plus unequal lube distribution can cause problems • Heads usually fly at 10nm and “descend” to ~2nm for actual I/O (by thermal expansion!) • Introducing a new reliability metric (MPbF): disk failure rate dependent on load (not on power-on-hours) • http://indico.cern.ch/getFile.py/access?contribId=37&sessionId=3&resId=3&materialId=slides&confId=247864 • 3 presentations on AFS • OpenAFS status report • 1.6 released in Sep 2011, slow (server-side) uptake • Security advisories • YFS : new security, new Rx (WAN), IPv4/IPv6, limits removed, … • Summary of IPv6 investigations & survey, concluding that dual-stack seems to be solution to “IPv6/AFS issue” Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 12
Questions? • “We built the first data centre with heaters!”(from Ulf Tigerstedt’s presentation on building the Kajaani DC ) • “Controlling a disk head is like flying a Jumbo 747 above a highway at a distance of less than 1 inch for 5 years!”(from Amit Chattopadhyay’s presentation on Disk Load Monitoring) Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 13