210 likes | 328 Views
Computing & Networking User Group Meeting. Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008. Users and JLab IT. Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee. Physics Computing Committee (Sandy Philpott)
E N D
Computing & NetworkingUser Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008
Users and JLab IT • Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee. • Physics Computing Committee (Sandy Philpott) • Helpdesk and CCPR requests and activities • Challenges • Constrained budget • Staffing • Aging infrastructure • Cyber Security
Computing and Networking Infrastructure Andy Kowalski
CNI Outline • Helpdesk • Computing • Wide Area Network • Cyber Security • Networking and Asset Management
Helpdesk • Hour 8am-12pm M-F • Submit a CCPR via http://cc.jlab.org/ • Dial x7155 • Send email to helpdesk@jlab.org • Windows XP, Vista and RHEL5 Supported Desktops • Migrating older desktops • Mac Support?
Computing • Email Servers Upgraded • Dovecot IMAP Server (Indexing) • New File Server and IMAP Servers (Farm Nodes) • Servers Migrating to Virtual Machines • Printing • Centralized Access via jlabprt.jlab.org • Accounting Coming Soon • Video Conferencing (working on EVO)
Wide Area Network • Bandwidth • 10Gbps WAN and LAN backbone • Offsite Data Transfer Servers • scigw.jlab.org(bbftp) • qcdgw.jlab.org(bbcp)
Cyber Security Challenge • The threat: sophistication and volume of attacks continue to increase. • Phishing Attacks • Spear Phishing/Whaling are now being observed at JLab. • Federal, including DOE, requirements to meet the cyber security challenges require additional measures. • JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.
Cyber Security • Managed Desktops • Skype Allowed From Managed Desktops On Certain Enclaves • Network Scanning • Intrusion Detection • PII/SUI (CUI) Management
Networking and IT Asset Management • Network Segmentation/Enclaves • Firewalls • Computer Registration • https://reggie.jlab.org/user/index.php • Managing IP Addresses • DHCP • Assigns all IP addresses (most static) • Integrated with registration • Automatic Port Configuration • Rolling out now • Uses registration database
Scientific Computing Chip Watson & Sandy Philpott
Farm Evolution Motivation • Capacity upgrades • Re-use of HPC clusters • Movement to Open Source • O/S upgrade • Change from LSF to PBS
Farm Evolution Timetable Nov 07: Auger/PBS available – RHEL3 - 35 nodes Jan 08: Fedora 8 (F8) available – 50 nodes May 08: Friendly-user mode; IFARML4,5 Jun 08: Production • F8 only; IFARML3 + 60 nodes from LSF IFARML alias Jul 08: IFARML2 + 60 nodes from LSF Aug 08: IFARML1 + 60 nodes from LSF Sep 08: RHEL3/LSF->F8/PBS Migration complete • No renewal of LSF or RHEL for cluster nodes
Farm F8/PBS Differences • Code must be recompiled • 2.6 kernel • gcc 4 • Software installed locally via yum • cernlib • Mysql • Time limits: 1 day default, 3 days max • stdout/stderr to ~/farm_out • Email notification
Farm Future Plans • Additional nodes • from HPC clusters • CY08: ~120 4g nodes • CY09-10: ~60 6n nodes • Purchase as budgets allow • Support for 64 bit systems when feasible & needed
Storage Evolution • Deployment of Sun x4500 “thumpers” • Decommissioning of Panasas (old /work server) • Planned replacement of old cache nodes
Tape Library • Current STK “Powderhorn” silo is nearing end-of-life • Reaching capacity & running out of blank tapes • Doesn’t support upgrade to higher density cartridges • Is officially end-of-life December 2010 • Market trends • LTO (Linear Tape Open) Standard has proliferated since 2000 • LTO-4 is 4x density, capacity/$, and bandwidth of 9940b: 800 GB/tape, $100/TB, 120 MB/s • LTO-5, out next year, will double capacity, 1.5x bandwidth: 1600 GB/tape, 180 MB/s • LTO-6 will be out prior to the 12 GeV era 3200 GB/tape, 270 MB/s
Tape Library Replacement • Competitive procurement now in progress • Replace old system, support 10x growth over 5 years • Phase 1 in August • System integration, software evolution • Begin data transfers, re-use 9940b tapes • Tape swap through January • 2 PB capacity by November • DAQ to LTO-4 in January 2009 • Old silo gone in March 2009 End result: breakeven on cost by the end of 2009!
Long Term Planning • Continue to increase compute & storage capacity in most cost effective manner • Improve processes & planning • PAC submission process • 12 GeV Planning…
LQCD Computing • JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling • National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration) • LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations • JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both