420 likes | 539 Views
IT Briefing. July 2007. Core Website Redesign IT Sourcing High Performance Computing Cluster NetCom Updates CTS Updates. John Mills Huron Consulting Keven Haynes Paul Petersen Karen Jenkins. IT Briefing July 19, 2007. Core Website Redesign. John Mills.
E N D
IT Briefing July 2007
Core Website Redesign IT Sourcing High Performance Computing Cluster NetCom Updates CTS Updates John Mills Huron Consulting Keven Haynes Paul Petersen Karen Jenkins IT Briefing July 19, 2007
Core Website Redesign John Mills
IT Hardware Initiative Discussion Kevin McClean, Huron John Scarbrough, Emory David Wright, Emory
Discussion Outline • Background & Objectives • Project Scope • Next Steps • Service Expectations & Concerns • Questions
Background & Objectives Background: Emory-wide initiative $10M in annual spend reviewed Scope: PCs, “small” departmental servers, printers, peripherals, software Completed initial data analysis; identified opportunity Other Objectives: Maintain or improve product quality and service levels Cost savings Leverage Emory-wide IT spend Evaluate current contract (Expires 1/08) Evaluate IT Hardware suppliers / industry Evaluate PC market & potential options Assess potential for further IT consolidation
Project Scope - Category Spend Source: Based on A/P & P-Card spend for University, Hospital, (April 06 – March 07) and Clinic (FY06), Supplier reporting
Next Steps • Finalize supplier strategy / Determine suppliers to engage • Send introduction letter with core requirement to select suppliers to solicit proposals - 7/20 • Responses due: 8/03 • Analyze initial supplier proposals • Conduct supplier meetings to discuss proposals – Week of 8/13 • Determine need for additional supplier proposals and meetings • Finalize new agreement - 9/15
Service Expectations & Concerns • All bundles must meet minimum recommendations set by DeskNet • Dedicated technical account manager / support engineer • On-site/local spares • Web based ability to order parts / Next day delivery • Escalated entry into support organization • Option to expedite delivery (for set fee) • MAC addresses emailed to requester on ship • Load pre-defined image on system • Option to change boot order (PXE boot) • Quarterly review of product roadmap • Evaluation of systems required prior to changing any bundle agreement • Consolidated packaging of system
IT Hardware Sourcing ? Questions
ELLIPSE The New High Performance Computing Cluster Keven Haynes
ELLIPSE: E-mory Li-fe P-hysical S-ciences cluster.
What does High Performance Computing (HPC) mean? • Computing used for scientific research • A.k.a, “Supercomputing” • Highly calculation-intensive tasks (e.g., weather forecasting, molecular modeling, string matching)
What is an HPC cluster? • A (large?) collection of computers, connected via high speed network or fabric • Sometimes acts/viewed as one computer • Sometimes share common storage • Sometimes run identical instances of the same operating system • Definition of cluster is fluid
What is an HPC cluster, again? • Uses multiple CPUs to distribute computational load, aggregate I/O. • Computation runs in parallel. • Not necessarily designed for fail-over, High Availability (HA) or load-balancing • Different from a Grid • Work managed via a “job scheduler”
Our new cluster (overview): • 256 dual-core, dual-socket AMD Opteron-based compute nodes - 1024 cores total • 8 GB RAM/node, 2 GB RAM/core • 250 GB local storage per node • ~ 8 TB global storage (parallel file system) • Gigabit Ethernet, with separate management network • 11 additional servers
Our cluster: Compute Nodes • 256 Sun x2200s • AMD Opteron 2218 processors • CentOS 4 Linux (whitebox Red Hat) • 8 GB DDR2 RAM, except “Fat” Nodes with 32 GB RAM, local 250 GB SATA drive • Single gigabit data connection to switch • Global filesystem (IBRIX) mounted
Our cluster: Networking • Separate Data and Management networks • Data Network: Foundry BI-RX 16 • Management network: 9 Foundry stackables • MRV console switches • Why ethernet? Open, supported, easy, cheap.
Our cluster: Cluster-wide Storage • Global, parallel file system: IBRIX • Sun StorEdge 6140, five trays of 16 15Krpm FC drives, connected via 4 GB fibre connections. • Five Sun x4100 file-system servers: one IBRIX Fusion Mgr, four Segment servers w/four bonded ethernet connections.
The IBRIX file system • Looks like an ext3 file system, because it is (not NFS 4) - Segmenting ext3. • Scales (horizontally) to thousands of servers, hundreds of petabytes • Efficient with both small and large I/O • Partial online operation, dynamic load balancing • Will run on any hardware (Linux only)
The Scheduler: Sun Grid Engine • Users submit work to cluster via SGE (‘qsub’ command)and ssh • SGE can manage up to 200,000 job submissions • Distributed Resource Management (DRM) • Policy-based resource allocation algorithms (queues)
Cluster-based Work • Cluster designed as “beowulf-style”, for high-throughput “serial/batch” processing. • “Embarrassingly Parallel” jobs best • MPI-based parallel processing possible, but difficult due to multiple-core architecture
Applications • MATLAB • Geant4 • Genesis (Neuroscience) • Soon: iNquiry (BioInformatics) • Gcc compilers (soon: PGI compilers) • More…
Performance • Estimated ~3 Teraflops at 80% efficiency (theoretical) • Achieved 2 GB/sec writes over the network • 10 minutes of cluster operation = ~7 days on a fast desktop • 8.5 hours -> entire year of 24-hour days
Project Status • Cluster went “live” July 1st • We are converting over billing arrangements: Annual -> $/CPU hour • Software installation, hardware replacement, developing processes • Much testing…
Contact Info • ELLISPE is managed by the HPC Group: • Keven Haynes, khaynes@emory.edu • Michael Smith, mdsmith@emory.edu • Ken Guyton, kenneth.m.guyton@emory.edu • Website soon…
HPC ? Questions
NetCom Updates Paul Petersen
Agenda • Single Voice Platform • Phase I Complete • Phase II Starting • Backbone and Firewall • Firewall Status • Multicasting • Border Changes • Wireless • NATing • iPhones
Single Voice Platform • Single Voice Platform • Name given to the project which consolidates Emory’s three phone switches to one • This project also sets Emory’s direction for VoIP/IP Telephony • Project began March 2006 with a formal RFQ process • Avaya was selected
Single Voice Platform • Phase 1 – Consolidate TEC & ECLH Switches • Upgrade to the latest Avaya switch • Upgrade to IP Connect (provides redundancy) • Consolidate the TEC & ECLH switch databases • Phase I completed on May 18th • Phase 2 – Convert the rest of EHC to SVP • Transition Nortel phones in EHC (EUH & WHSCAB) to Avaya • Approved and Completely funded • Phase 3 – Convert remainder of Nortel phones to new Platform
Firewall and Backbone • Firewall • ResNet Firewall – October 2006 • HIPAA Firewall – March 2007 • Academic Firewall – April 2007 • Admin Core/DMZ Firewall – Attempted May 6th • 5.4.eo5 Code • Premature Session Timeouts • Layer2 Pointer Crash (lab only) • ASIC Optimizations • Software Policy Lookups Crash (lab only) • SLU engine/ASIC Chip resets • Academic/ResNet Cluster Upgraded – July 12th • HIPAA Cluster Upgraded – July 19th
Multicasting • Multicasting with Virtual Routing • Supported in version 3.5 of router code • NetCom has been testing Beta version for a month • Also provides Hitless Upgrades • Successfully imaged two workstations using Ghost and multicasting across two router hops with the College • Official version of 3.5 to be released this week • Tentatively scheduled to upgrade router core on August 1st.
Border Changes • Converging Emory’s Border Network • Merged Healthcare and University borders (4/25) • Converted Internet2 to 10gig and changed AS (6/26) • Moved Global Crossing to new border routers (7/10) • Moved Level3 to new border router &changed AS (7/17) • Next Steps: • Change in Global Crossings and Level3 contracts • Atlanta Internet Exchange (AIX)
Wireless • NATingWirless? • Proliferation of Wireless Devices • Strain on University IP Address space • Downside – Lose some tracking abilities • Testing with NetReg • Goal would be to implement before start of school • The iPhone • Update on the problem at Duke • WPA Enterprise/Guest Access • Official statement on Support
? Questions NetCom
CTS Updates Karen Jenkins
HealthCare Exchange • 32 scheduled seminars – over 700 attendees • SMTP flip completed; GAL updated • Information on project website continuing to expand • Problems with beta users (Zantaz & VDT) • One outstanding Zantaz + VDT problem • Current Schedule • Pre-Pilot ~7/23 • Pilot ~8/6 • Production ~8/13
CTS ? Questions