400 likes | 591 Views
High-Performance Computing With Windows. Ryan Waite General Program Manager Windows Server HPC Group Microsoft Corporation. Outline. Part 1: Overview Why Microsoft has gotten into HPC What our V1 product offers Some future directions Part 2: Drill-down
E N D
High-PerformanceComputing With Windows Ryan Waite General Program Manager Windows Server HPC Group Microsoft Corporation
Outline • Part 1: Overview • Why Microsoft has gotten into HPC • What our V1 product offers • Some future directions • Part 2: Drill-down • A few representative V1 features(for those who are interested)
Part 1 Overview
Instruments Experiments done with a telescope by Galilei 400 years ago inaugurated the scientific method Microscope, laser, x-ray, collider, accelerator allowed peering further and deeper into matter HPC Automation and acceleration of the scientific and engineering process itself Digital instruments, data mining, simulation, experiment steering Evolving Tools Of The Scientific Process 1. Observation 2. Hypothesis 3. Prediction 4. Validation
The Next ChallengeTaking HPC Mainstream • Volume economics of industry standard hardware and commercial software applications are rapidly bringing HPC capabilities to a broader number of users • But HPC is still only accessible to the few computational scientists who can master a domain science, program parallel, distributed algorithms, and use/manage a supercomputer • Microsoft HPC Strategy – taking HPC to the mainstream • Enabling broad HPC adoption and making HPC into a high volume market in which everyone can have their own personal supercomputer • Enabling domain scientists who are not computer scientists to partake in the HPC revolution
Evidence Of Standardization And Commoditization Clusters over 70% Industry usage rising x86 is leading (Pentium 41%, EM64T 16%, Opteron 11%) GigE is gaining (50% of systems)
HPC Market Trends <$250K – 97% of systems, 55% of revenue Capability, Enterprise$1M+ 2005 Systems 2005 Growth Divisional$250K-$1M Departmental$50-250K 981 -3% Workgroup<$50K 4,988 30% Report of the High-End Computing Revitalization Task Force, 2004 (Office of Science and Technology Policy, Executive Office of the President) “Make high-end computing easier and more productive to use. Emphasis should be placed on time to solution, the major metric of value to high-end computing users… A common software environment for scientific computation encompassing desktop to high-end systems will enhance productivity gains by promoting ease of use and manageability of systems.” 21,733 36% 163,441 33% Source: IDC, 2005
Top Challenges • Setup is painful • Takes a long time to get clusters up and running • Clusters are separate islands • Lack of integration intoIT infrastructure • Job management • Lack of integration intoend-user apps • Application availability • Limited eco-system of applications that can exploit parallel processing capabilities “Make high-end computing easier and more productive to use. Emphasis should be placed on time to solution, the major metric of value to high-end computing users… A common software environment for scientific computation encompassing desktop to high-end systems will enhance productivity gains by promoting ease of use and manageability of systems.” High-End Computing Revitalization Task Force, 2004 (Office of Science and Technology Policy, Executive Office of the President)
Simplified cluster deployment, job submission and status monitoring Better integration with existing Windows infrastructure allowing customers to leverage existing technology and skill-sets Familiar development environment allows developers to write parallel applications from within the powerful Visual Studio IDE Windows Compute Cluster Server 2003
Windows Compute Cluster Server 2003 Desktop App Policy, reports Jobs User Admin Console Job Mgr UI Admin Cmd line Cmd line High speed, low latency interconnect DB/FS Node Manager
Leveraging Existing Windows Infrastructure Active Directory Windows Security Integration with IT infrastructure Kerberos authentication Resource management Secure job execution Group policies Secure MPI Compute Cluster Built-in Tools Microsoft Enterprise Management Tools Job scheduler Operations manager Admin console Windows Update services Performance monitor Systems Management Server Command line interface Remote Installation services
CCS Key Features • Node deployment and administration • Task-based configuration for head and compute nodes • UI and command line-based node management • Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server Performance Advisor (SPA), and 3rd-party tools • Integration with existing Windows and management infrastructure • Integrates with Active Directory, Windows security technologies, management, and deployment tools • Extensible job scheduler • 3rd-party extensibility at job submission and/or job assignment • Submit jobs from command line, UI, or directly from applications • Simple job management, similar to print queue management • Secure and performant MPI • User credentials secured in job scheduler and compute nodes • MPI stack based on MPICH2 reference implementation • Support for high performance interconnects through Winsock Direct • Integrated development environment • OpenMP support in Visual Studio, Standard Edition • Parallel debugger in Visual Studio, Professional Edition
HPC Institutes Cornell Theory CenterIthaca, NY U.S.A. Southampton UniversitySouthampton, UK Nizhni Novgorod University Nizhni Novgorod, Russia National Center for Supercomputing Applications, IL U.S.A. University of VirginiaCharlottesville, VA U.S.A. Tokyo Institute of TechnologyTokyo, Japan University of UtahSalt Lake City, UT U.S.A. University of TennesseeKnoxville, TN U.S.A. HLRS – University of StuttgartStuttgart, Germany Shanghai Jiao Tong UniversityShanghai, PRC TACC – University of TexasAustin, TX U.S.A.
An Example Of Porting To WindowsWeather research and forecasting model • Large collaborative effort, lead by NCAR, to develop next-generation community model with direct path to operations • Applications • Atmospheric research • Numerical weather prediction • Coupled modeling systems • Current release WRFV2.1.2 • ~1/3 million lines, Fortran 90and some C using MPI, OpenMP • Traditionally developed for Unix HPC systems • Two dynamical cores • Full range of physics options • Rapid community growth –more than 3,000 registered users • Operational capabilities • U.S. Air Force Weather Agency • National Centers for Environmental Prediction (NOAA) • KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)
WRF On Windows • Motivation • Extend available systems available to WRF users • Stability and consistency with respect to Linux • Take advantage of Microsoft and 3rd party (e.g., Portland Group) development tools, environments • WRF ported under SUA and running on development AMD64 clusters using Compute Cluster Pack • Of 360k lines, fewer than 750 changed to compile and link under SUA • Largest number of changes involved the WRF build mechanism (Makefiles, scripts) • Level of effort and nature of tasks was not unlike porting to any new version of UNIX • Details of porting experience described in a white paper available from Microsoft and at http://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htm
Excel Services on Windows Compute Cluster Server 2003 Excel Services Excel “12” Desktop Servers Clusters An Example Of Application Integration With HPCScaling Excel
Excel “12” Excel “12”client Customapplications Excel Services Browser100% thin View and Interact Author and Publish Spreadsheets Open Spreadsheet/Snapshot Web ServicesAccess
Excel And Windows CCS • Customer requirements • Faster spreadsheet calculation • Free-up client machines from long-running calculations • Time/mission critical calculations that must run • Parallel iterations on models • Example scenarios • Schedule overnight risk calculations • Farm out analytical library calculations • Scale-out Monte Carlo iterations, parametric sweeps
Evolution Of HPC IT Mgr Manual, batchexecution Interactive Computation and Visualization SQL
The key challenge How to program these things Concurrent programmingwill be an important areaof investments for all of Microsoft (not just HPC) Cheap Cycles And Personal Supercomputing • IBM Cell processor • 256 Gflops today • 4 node personal cluster 1 Tflops • 32 node personal cluster Top100 • Microsoft Xbox • 3 custom PowerPCs + ATI graphics processor • 1 Tflops today • $300 • 8 node personal cluster “Top100” for $2500 (ignoring all that you don’t get for $300) • Intel many-core chips • “100’s of cores on a chip in 2015” (Justin Rattner, Intel) • “4 cores”/Tflop 25 Tflops/chip 22
“Grid Computing” • A catch-all marketing term • Desktop cycle-stealing • Managed HPC clusters • Internet access to giant, distributed repositories • Virtualization of data center IT resources • Out-sourcing to “utility data centers” • “Software as a service” • Parallel databases
HPC Grids And Web Services • Compute grid • Forest of clusters • Coordinated scheduling of resources • Data grid • Distributed storage facilities • Coordinated management of data • Web Services • Glue for heterogeneous platforms/applications/systems • Cross- and intra-organization integration • Standards-baseddistributed computing • Interoperability and composability
Part 2 Drill-Down
Technologies • Platform • Windows Server 2003 SP1 64-bit Edition • x64 processors (Intel EM64T and AMD Opteron) • Ethernet, Ethernet over RDMA and Infiniband support • Administration • Prescriptive, simplified cluster setup and administration • Scripted, image-based compute node management • Active Directory based security • Scalable job scheduling and resource management • Development • MPICH-2 from Argonne National Labs with performance and security enhancements • Cluster scheduler programmable via Web Services and DCOM • Visual Studio 2005 – OpenMP, Parallel Debugger • Partner delivered Fortran compilers and numerical libraries
Head Node Installation • Head Node installs only on x64 • Windows 2003 Compute Cluster Edition • Windows 2003 SP1 Standard And Enterprise • Windows 2003 R2 • Installation • Leverages appliance like functionality • Scripted installation • Warnings if system is misconfigured • To Do list to assist with final configuration • Walkthrough • Windows Server 2003 is installed on the head node • System may have been pre-installed using OPK • User launches Compute Cluster Kit setup • To Do list starts up, guiding User through next steps • User joins Active Directory domain • User installs IP over IB drivers for InfiniBand cards if not pre-installed • Wizard assists with multi-NIC routing and configuration • Remote Installation Service is configured for imaging compute nodes
Compute Node Installation • Automated installation • Remote Installation Service provides simpleimaging solution • May use third-party system imaging toolscompute nodes • Requires private network • Walkthrough • User racks up compute nodes • Starts Add Node wizard • Powers up a group of compute nodes • Compute nodes PXE boot • RIS and installation scripts will • Install operating system: W2K3 SP1 • Install drivers • Join appropriate domain • Install compute cluster software (CD2) • Join cluster • Exiting wizard turns off RIS
Node Management • Not building a new systems management paradigm • Leveraging Windows infrastructure for simple management • MMC, Perfmon, Event Viewer, Remote Desktop • Can integrate with enterprise management infrastructure, such as Microsoft Operations Manager • Compute Cluster MMC snap-in • Supports specific actions • Pause Node • Resume Node • Open CD Drive • Reboot Node • Execute Command • Remote Desktop Connection • Start PerfMon • Delete • Properties • Can operate on multiple nodes at once
Parallel MPI Job Serial Job Parameter Sweep Job Task Task Task Task Task Proc Proc IPC Proc Proc Proc Proc Task Flow Job Task Task Task Task Job/Task Conceptual Model
Job Scheduler Stack Jobs/Tasks Client Node Admission Head Node Allocation Activation Compute Node
Job Scheduler • Job scheduler provides two features: Ordering and allocation • Job ordering • Priority-based first-come, first-serve (FCFS) • Backfill supported for jobs with time limits • Resource allocation • License-aware scheduling through plug-ins • Parallel application node allocation policies • Extensible • Core engine based on embedded SQL engine • Resource and job descriptions are based on XML • 3rd parties can extend by plugging into submission and execution phases to implement queuing and licensing policies • Job submission • Jobs submitted via UI, API, command line, or web service • Security • Jobs on compute nodes execute in the security account of the submitting user, allowing secure access to networked resources • Cleanup • Jobs executed in Job Objects on compute nodes, facilitating cleanup
Queue Management • Job Management model similar to print queue management • Leverage familiar user paradigm • Queue management operations • Delete • Change properties • Priority • Run time • # of CPUs • Preferred nodes • CPUs per node • All in one • License parameters • Uniform attributes • Notification
Networking • Focusing on industry standard interconnect technologies • MPI implementation tuned to Winsock • Automatic RDMA support through Winsock Direct(SAN provider required from IHV) • Gigabit Ethernet • Expect to be the mainstream choice • RDMA + GigE offers compelling latency • Infiniband • Emerging as a leading high end solution • Engaged with all IB vendors • OpenIB group developing a Windows IB stack • Planning to support IB in WHQL
Resources • Microsoft HPC web site(evaluation copies available) • http://www.microsoft.com/hpc/ • Microsoft Windows Compute Cluster Server 2003 community site • http://www.windowshpc.net/ • Windows Server x64 information • http://www.microsoft.com/64bit/ • http://www.microsoft.com/x64/ • Windows Server System information • http://www.microsoft.com/wss/
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.