470 likes | 792 Views
MS MPI & SDK Improvements. Agenda. General Windows HPC Server 2008 Development Topics Scheduler API v2 MS MPI Features MPI Debugging Porting Gotchas Tracing. Agenda. General Windows HPC Server 2008 Development Topics Scheduler API v2 MS MPI Features MPI Debugging. Developer Computer.
E N D
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging • Porting Gotchas • Tracing
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging
Developer Computer Developer Computer Head Node Head Node SDK SDK Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node A Developer’s View of Cluster Connectivity Developer Computer on aCluster Node (Non-Production) Developer Computer on a Corporate Network
Job Scheduler Stack Jobs/Tasks Client Node Admission Head Node Allocation Activation Compute Node
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging • Tracing
Client-side perspective • From the end user’s perspective, Windows Computing cluster is another network resource • In this case, a compute engine… ...
Problem areas • But the compute engine is not so easy to interact with… • Main problem areas: • WCCS is a batch-oriented system — minimal user interaction • Application deployment is a problem — across many compute nodes • Data presents a problem — distribution, aggregation, & visualization ? ...
Solution • Create a custom, client-side, WCCS front-end for your application • Client-side app can: • Install HPC app & data • Simplify job submission • Provide execution feedback • Collect & display results • Cleanup HPC app ... Client-side app
Overview of using API — with C# using Microsoft.Hpc.Scheduler; class Program { static void Main() { IScheduler store = new Scheduler(); store.Connect(“localhost”); ISchedulerJob job = store.CreateJob(); job.AutoCalculateMax = true; job.AutoCalculateMin = true; ISchedulerTask task = job.CreateTask(); task.CommandLine = "ping 127.0.0.1 -n *"; task.IsParametric = true; task.StartValue = 1; task.EndValue = 10; task.IncrementValue = 1; task.MinimumNumberOfCores = 1; task.MaximumNumberOfCores = 1; job.AddTask(task); store.SubmitJob(job, @"hpc\Administrator", "p@ssw0rd"); } } Connect to the cluster Create a job Create a task Add task to job Submit job for execution
Scheduler API demo
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging • Tracing
V2 MS MPI Features • MPI.NET • New Managed MPI Library API • In Collaboration with Indiana University • MPI Tracing via ETW • Event Tracing for Windows • Enables MPI Message Monitoring/Profiling • Performance Improvements • Shared Memory Mode Enhancements • RDMA via Network Direct • New Low-Latency Network Interface • 4 Networking Paths Now Supported
About MPI • MPI is a standard specification, there are many implementations • MPICH and MPICH2 reference implementations from Argonne • MS MPI based on (and compatible with) MPICH2 • Other implementations include LAM-MPI, OpenMPI, MPI-Pro, WMPI • Why did MS HPC team choose MPI? • MPI has emerged as de-facto standard for parallel programming • MPI consists of 3 parts • Full-featured API of 160+ functions • Secure process launch and communication runtime • Command-line (mpiexec) to launch jobs
Fundamental MPI Features Programming with MPI • Communicators • Groups of nodes used for communications • MPI_COMM_WORLD is your friend • Rank (a node’s ID) • Target communications • Segregate work • Collective Operations • Collect and reduce data in a single call • sum, min, max, and/or, etc • Fine control of comms and buffers if you like • MPI and derived data types Launching Jobs • MPIexec arguments • # of processors required • Names of specific compute nodes to use • Launch and working directories • Environment variables to set for this job • Global values (for all compute nodes- not just the launch node) • Point to files of command line arguments • env MPICH_NETMASK to control network used for this MPI job
Example: Calculate Pi PI 3.1428947295916889 Error 0.0013020760018958 1 1 n intervals
Example: Calculate Pi in Parallel PI 3.1419630237914182 Error 0.0003703702016251 1 1 n intervals
Traditional MPI “Hello World” /* C Example */ #include <stdio.h> #include <mpi.h> int main (intargc, char *argv[]) {int rank, size, MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
MPI.NET “Hello World”http://www.osl.iu.edu/research/mpi.net using System; using MPI; class MPIHello { static void Main(string[] args) { using (new MPI.Environment(ref args)) { // MPI program goes here! Console.WriteLine(“Hello World”); } } }
Ring around the network… using System; using MPI; class Ring { static void Main(string[] args) { using (new MPI.Environment(ref args)) { Communicator comm = Communicator.world; if (comm.Rank == 0) { // program for rank 0 } else // not rank 0 { // program for all other ranks } } } }
MPI & MPI.Net demo
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging • Tracing
Debugging • Parallel apps are hard to debug: • Non-interactive • Long-running • Unrepeatable behavior • Concurrent! • Best practices: • Develop sequential version first (with numerous test cases) • Develop parallel version guided by simplicity & confidence in correctness • Profile and optimize as necessary • If possible, meet performance goals by adding hardware (vs. sacrificing productivity & correctness with more complex solutions).
Debugging options in MPI • Debugging is a fact of life • Options: • Print-style debugging • Source-level debugger (e.g. Visual Studio) • Tracing • Recommendations: • Debug locally first (by running mpiexec on your dev machine) • Then deploy and debug on cluster • Source-level debugger works locally for small, shorter-duration cases • Otherwise turn to print-style / tracing…
Print-style debugging • Via _tprintf / cout statements • Leave in place by taking advantage of conditional compilation (#ifdef) NOTE: myRank and host are globals initialized by MPI_Init / gethostname predefined by Visual Studio when building Debug version #ifdef_Debug cout << myRank << " (" << host << "): Broadcasting" << endl << flush; #endif /* all processes */ MPI_Bcast(params, 2, MPI_INT, 0 /*master*/, MPI_COMM_WORLD);
Visual Studio’s source-level debugger • Visual Studio provides support for MPI-based debugging • You can debug multiple processes running locally on your dev machine • You can debug multiple processes running remotely on cluster • Overview: • Install remote debugging support (installed by default with Visual Studio) • Configure Visual Studio • Set breakpoints • F5 — run with debugging!
Configuring Visual Studio • Configure project properties for “MPI Cluster Debugger” • Configure VS to stop all processes when any breakpoint is reached
Set breakpoints — and run! • Press F5 to start • When breakpoint hit,all processes stop • Use toolbar buttons to step • Press F5 to continue — all processes run until another breakpoint is hit(==> press N times for all to hit brkpt)
Additional tools • Allinea is developing a debugger plug-in for Visual Studio: • DDTLite: http://www.allinea.com • The Portland Group offers a debugger for WCCS • PGDBG: http://www.pgroup.com/resources/mpitools.htm • Supports OpenMP debugging, MPI debugging under development
MPI Debugging demo
Agenda • General Windows HPC Server 2008 Development Topics • Scheduler API • v2 MS MPI Features • MPI Debugging • Tracing
Tracing • MPI standard was developed with tracing in mind • For every routine of the form MPI_xxx: • There must exist an equivalent routine with the name PMPI_xxx • The user must be able to replace MPI_xxx with their own version • Example: • Trace every call to MPI_Send bywriting your own… int MPIAPI MPI_Send(void *buf, int count, MPI_Datatypedatatype, intdest, int tag, MPI_Commcomm) { ... ; // pre-process: intrc = PMPI_Send(buf, count, datatype, dest, tag, comm); ... ; // post-process: return rc; }
Tracing Necessities • A couple of essentials to keep in mind as you plan tracing activities on your HPC Server 2008 cluster: • Run trace jobs with a user account in either the Cluster Administrator or Performance Log Users groups. • This is a security measure of the Event Tracing for Windows subsystem. • Run all trace jobs as “Exclusive” in the HPC Server 2008 Scheduler. This ensures a single user will be running on the compute nodes for the duration of each trace job thereby avoiding confusion/conflict of the trace data.
Devs can't tune what they can't seeMS-MPI integrated with Event Tracing for Windows Trace Control & Clock Sync • Single, time-correlated log of: OS, driver, MPI, and app events on each compute node • CCS-specific additions • High-precision, cross-nodeclock correlation data produced during the job’s execution • Potential Feature: Logs collected from multiple compute nodes into a record of parallel app execution • Dual purpose: • Performance Analysis • Application Trouble-Shooting • Trace Data Display • Visual Studio & Windows ETW tools • Vampir & Jumpshot Viewers for Windows MS-MPI mpiexec.exe -trace args logman.exe Windows ETW Infrastructure Convert to text Live feed Trace Log File ? Consolidate Trace files at end of job? Trace Log Files Trace Log Files Trace Log Files Windows ETW Infrastructure MS-MPI Trace Log File
Jumpshot MPI Trace Viewer • Getting Jumpshot: • Argonne National Lab • Download slog2rte (“Slog-2 Runtime Environment”) • Replace clog2TOslog2.jar with v2.40 from MPE2-1.0.3p1*** • http://www-unix.mcs.anl.gov/perfvis/download/index.htm .clog2 .slog2
Intel Trace Analyzer MPI Trace Viewer • ETW based • OTF standard format • aka “Vampir” • Intel® Cluster Toolkit • Being ported to Windows • Q3 2008
Tracing Your Applications • Add “–trace” argument to the command which launches your MS-MPI application • Results are written on each node on which the job is running (user that submitted the job profile folder) • Naming Convention: • mpiexec -trace MyApp.exe arg1 arg2 … argN • mpi_trace_{JobID}.{TaskID}.{TaskInstanceID}.etl
Create the CPU clock synchronization data • Clock correlation date needs to be created for *each* processor core used in the job • Use mpisync to create the clock correlation data: • Results in clock synchronization data for each core: • Somewhat tedious…is there a faster way of accomplishing this? • mpicsync mpi_trace_42.1.0.etl • mpi_trace_42.1.0.etl.info
Tracing… • The following command will generate the synchronization data on all nodes: • Use the HPC Server 2008 version of the ETW formatting tool, tracefmt.exe, to both format the event log as text and apply the clock corrections (using mpiexec): • mpiexec -cores 1 –wdir %%USERPROFILE%% mpicsyncmpi_trace_%CCP_JOBID%.%CCP_TASKID%.%CCP_TASKINSTANCEID%.etl • tracefmt mpi_trace_42.1.0.etl -nosummary -hires -tmf "%CCP_HOME%\bin\mpitrace.tmf" -o mpi_trace_42.1.0.etl.txt
Tracing… • Use the HPC Server 2008 OTF translator, etl2otf.exe, to both format the event log as an Open Trace Format file and apply the clock corrections: • Use the HPC Server 2008 CLOG translator, etl2clog.exe, to both format the event log as a CLOG Event file for use with Argonne National Lab’s Jumpshot viewer and apply the clock corrections: • etl2otf mpi_trace_42.1.0.etl • etl2clog mpi_trace_42.1.0.etl
Release Schedule • Beta 1 – Nov 2007 • Beta 2 – May 2008 • RC1 – June 2008 • RTM – Q3, 2008 Beta 2 RTM Beta 1 Nov 2007 Late Summer 2008 Spring 2008
Resources • Microsoft HPC Web site – download Beta 1 Today! • http://www.microsoft.com/hpc • http://blogs.msdn.com/philpenn • http://blogs.technet.com/gmarchetti • http://connect.microsoft.com • http://code.msdn.microsoft.com/hpc • Windows HPC Community site • http://www.windowshpc.net • Windows Server x64 information • http://www.microsoft.com/x64/ • Windows Server System information • http://www.microsoft.com/windowsserver • Get the Facts Web site • http://www.microsoft.com/getthefacts
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.