1 / 42

IT Briefing

This briefing outlines Emory University's $10M IT Hardware Initiative, discussing objectives, project scope, service expectations, and concerns, along with the High Performance Computing Cluster in detail.

Download Presentation

IT Briefing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IT Briefing July 2007

  2. Core Website Redesign IT Sourcing High Performance Computing Cluster NetCom Updates CTS Updates John Mills Huron Consulting Keven Haynes Paul Petersen Karen Jenkins IT Briefing July 19, 2007

  3. Core Website Redesign John Mills

  4. IT Hardware Initiative Discussion Kevin McClean, Huron John Scarbrough, Emory David Wright, Emory

  5. Discussion Outline • Background & Objectives • Project Scope • Next Steps • Service Expectations & Concerns • Questions

  6. Background & Objectives Background: Emory-wide initiative $10M in annual spend reviewed Scope: PCs, “small” departmental servers, printers, peripherals, software Completed initial data analysis; identified opportunity Other Objectives: Maintain or improve product quality and service levels Cost savings Leverage Emory-wide IT spend Evaluate current contract (Expires 1/08) Evaluate IT Hardware suppliers / industry Evaluate PC market & potential options Assess potential for further IT consolidation

  7. Project Scope - Category Spend Source: Based on A/P & P-Card spend for University, Hospital, (April 06 – March 07) and Clinic (FY06), Supplier reporting

  8. Next Steps • Finalize supplier strategy / Determine suppliers to engage • Send introduction letter with core requirement to select suppliers to solicit proposals - 7/20 • Responses due: 8/03 • Analyze initial supplier proposals • Conduct supplier meetings to discuss proposals – Week of 8/13 • Determine need for additional supplier proposals and meetings • Finalize new agreement - 9/15

  9. Service Expectations & Concerns • All bundles must meet minimum recommendations set by DeskNet • Dedicated technical account manager / support engineer • On-site/local spares • Web based ability to order parts / Next day delivery • Escalated entry into support organization • Option to expedite delivery (for set fee) • MAC addresses emailed to requester on ship • Load pre-defined image on system • Option to change boot order (PXE boot) • Quarterly review of product roadmap • Evaluation of systems required prior to changing any bundle agreement • Consolidated packaging of system

  10. IT Hardware Sourcing ? Questions

  11. ELLIPSE The New High Performance Computing Cluster Keven Haynes

  12. ELLIPSE: E-mory Li-fe P-hysical S-ciences cluster.

  13. What does High Performance Computing (HPC) mean? • Computing used for scientific research • A.k.a, “Supercomputing” • Highly calculation-intensive tasks (e.g., weather forecasting, molecular modeling, string matching)

  14. What is an HPC cluster? • A (large?) collection of computers, connected via high speed network or fabric • Sometimes acts/viewed as one computer • Sometimes share common storage • Sometimes run identical instances of the same operating system • Definition of cluster is fluid

  15. What is an HPC cluster, again? • Uses multiple CPUs to distribute computational load, aggregate I/O. • Computation runs in parallel. • Not necessarily designed for fail-over, High Availability (HA) or load-balancing • Different from a Grid • Work managed via a “job scheduler”

  16. Our new cluster (overview): • 256 dual-core, dual-socket AMD Opteron-based compute nodes - 1024 cores total • 8 GB RAM/node, 2 GB RAM/core • 250 GB local storage per node • ~ 8 TB global storage (parallel file system) • Gigabit Ethernet, with separate management network • 11 additional servers

  17. Cluster diagram

  18. Cluster Picture

  19. Our cluster: Compute Nodes • 256 Sun x2200s • AMD Opteron 2218 processors • CentOS 4 Linux (whitebox Red Hat) • 8 GB DDR2 RAM, except “Fat” Nodes with 32 GB RAM, local 250 GB SATA drive • Single gigabit data connection to switch • Global filesystem (IBRIX) mounted

  20. Our cluster: Networking • Separate Data and Management networks • Data Network: Foundry BI-RX 16 • Management network: 9 Foundry stackables • MRV console switches • Why ethernet? Open, supported, easy, cheap.

  21. Our cluster: Cluster-wide Storage • Global, parallel file system: IBRIX • Sun StorEdge 6140, five trays of 16 15Krpm FC drives, connected via 4 GB fibre connections. • Five Sun x4100 file-system servers: one IBRIX Fusion Mgr, four Segment servers w/four bonded ethernet connections.

  22. The IBRIX file system • Looks like an ext3 file system, because it is (not NFS 4) - Segmenting ext3. • Scales (horizontally) to thousands of servers, hundreds of petabytes • Efficient with both small and large I/O • Partial online operation, dynamic load balancing • Will run on any hardware (Linux only)

  23. The Scheduler: Sun Grid Engine • Users submit work to cluster via SGE (‘qsub’ command)and ssh • SGE can manage up to 200,000 job submissions • Distributed Resource Management (DRM) • Policy-based resource allocation algorithms (queues)

  24. Cluster-based Work • Cluster designed as “beowulf-style”, for high-throughput “serial/batch” processing. • “Embarrassingly Parallel” jobs best • MPI-based parallel processing possible, but difficult due to multiple-core architecture

  25. Applications • MATLAB • Geant4 • Genesis (Neuroscience) • Soon: iNquiry (BioInformatics) • Gcc compilers (soon: PGI compilers) • More…

  26. Performance • Estimated ~3 Teraflops at 80% efficiency (theoretical) • Achieved 2 GB/sec writes over the network • 10 minutes of cluster operation = ~7 days on a fast desktop • 8.5 hours -> entire year of 24-hour days

  27. Project Status • Cluster went “live” July 1st • We are converting over billing arrangements: Annual -> $/CPU hour • Software installation, hardware replacement, developing processes • Much testing…

  28. Contact Info • ELLISPE is managed by the HPC Group: • Keven Haynes, khaynes@emory.edu • Michael Smith, mdsmith@emory.edu • Ken Guyton, kenneth.m.guyton@emory.edu • Website soon…

  29. HPC ? Questions

  30. NetCom Updates Paul Petersen

  31. Agenda • Single Voice Platform • Phase I Complete • Phase II Starting • Backbone and Firewall • Firewall Status • Multicasting • Border Changes • Wireless • NATing • iPhones

  32. Single Voice Platform • Single Voice Platform • Name given to the project which consolidates Emory’s three phone switches to one • This project also sets Emory’s direction for VoIP/IP Telephony • Project began March 2006 with a formal RFQ process • Avaya was selected

  33. Single Voice Platform • Phase 1 – Consolidate TEC & ECLH Switches • Upgrade to the latest Avaya switch • Upgrade to IP Connect (provides redundancy) • Consolidate the TEC & ECLH switch databases • Phase I completed on May 18th • Phase 2 – Convert the rest of EHC to SVP • Transition Nortel phones in EHC (EUH & WHSCAB) to Avaya • Approved and Completely funded • Phase 3 – Convert remainder of Nortel phones to new Platform

  34. Firewall and Backbone • Firewall • ResNet Firewall – October 2006 • HIPAA Firewall – March 2007 • Academic Firewall – April 2007 • Admin Core/DMZ Firewall – Attempted May 6th • 5.4.eo5 Code • Premature Session Timeouts • Layer2 Pointer Crash (lab only) • ASIC Optimizations • Software Policy Lookups Crash (lab only) • SLU engine/ASIC Chip resets • Academic/ResNet Cluster Upgraded – July 12th • HIPAA Cluster Upgraded – July 19th

  35. Multicasting • Multicasting with Virtual Routing • Supported in version 3.5 of router code • NetCom has been testing Beta version for a month • Also provides Hitless Upgrades • Successfully imaged two workstations using Ghost and multicasting across two router hops with the College • Official version of 3.5 to be released this week • Tentatively scheduled to upgrade router core on August 1st.

  36. Border Changes • Converging Emory’s Border Network • Merged Healthcare and University borders (4/25) • Converted Internet2 to 10gig and changed AS (6/26) • Moved Global Crossing to new border routers (7/10) • Moved Level3 to new border router &changed AS (7/17) • Next Steps: • Change in Global Crossings and Level3 contracts • Atlanta Internet Exchange (AIX)

  37. Wireless • NATingWirless? • Proliferation of Wireless Devices • Strain on University IP Address space • Downside – Lose some tracking abilities • Testing with NetReg • Goal would be to implement before start of school • The iPhone • Update on the problem at Duke • WPA Enterprise/Guest Access • Official statement on Support

  38. ? Questions NetCom

  39. CTS Updates Karen Jenkins

  40. HealthCare Exchange • 32 scheduled seminars – over 700 attendees • SMTP flip completed; GAL updated • Information on project website continuing to expand • Problems with beta users (Zantaz & VDT) • One outstanding Zantaz + VDT problem • Current Schedule • Pre-Pilot ~7/23 • Pilot ~8/6 • Production ~8/13

  41. CTS ? Questions

More Related