1 / 15

OS, Message Passing, and Runtime Tools Panel

OS, Message Passing, and Runtime Tools Panel. Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory. Outline. Burning issues in operating systems What is in MPI-2? Which of the runtime tools really work?. Burning Issues in OS.

hammer
Download Presentation

OS, Message Passing, and Runtime Tools Panel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OS, Message Passing, and Runtime Tools Panel Rajeev Thakur Mathematics and Computer Science DivisionArgonne National Laboratory

  2. Outline • Burning issues in operating systems • What is in MPI-2? • Which of the runtime tools really work?

  3. Burning Issues in OS • What will be the impact of Linux and open source on large supercomputing platforms? • Large impact • Everyone is experimenting • Mainstream vendors are exploring use of Linux in ~10TF systems • Large systems in labs, e.g., Cplant and Chiba City • Many medium size systems, which are growing, e.g., Putchong’s cluster in Thailand is up to 72 nodes • Raised expectations for faster turnaround on problems

  4. Burning Issues in OS • Is Linux only for small and medium scale clusters? • That is the question we all are trying to answer • For distributed-memory systems, the network must scale. Linux doesn’t come in the way. • For large clusters we need cluster management tools, scalable scientific software, middleware • For shared-memory systems, on the other hand, the OS must scale

  5. Burning Issues in OS • Will we need a version of Linux specially designed for large supercomputers? • To be determined • Module system a step in the right direction

  6. What is in MPI-2? • Extensions to the message-passing model • Dynamic process management • One-sided operations • Parallel I/O • Making MPI more robust and convenient • C++ and Fortran 90 bindings • External interfaces • Extended collective operations • Language interoperability • MPI interaction with threads

  7. What is in MPI-2? • Will the one-sided operations be widely used? • One-sided ops. are a popular programming model on Cray T3D and T3E • Note that simple put/get/barrier enabled by hardware support on T3D/T3E • MPI-2 one-sided ops. expose complex memory consistency issues (like MPI-1 exposed buffering issues) • MPI-2 one-sided ops. are designed such that they can be implemented efficiently even where there is no hardware support for one-sided ops.

  8. What is in MPI-2? • one-sided ops… • expected to support libraries (e.g., Global Arrays) • Example application: FLASH (Univ. of Chicago) • Original code uses PARAMESH library, which uses Cray shmem (put/get/barrier) • Straighforward port to MPI-1 by using MMPI library, which translates Cray shmem to MPI-1 • Now have PARAMESH directly using MPI-1 • Want to investigate implementing PARAMESH with MPI-2 one-sided ops.

  9. What is in MPI-2? • MPI-2 one-sided ops. implementation status • Fujitsu, NEC, Hitachi have it • HP has released a subset • Compaq, IBM in progress • Coming in MPICH...

  10. What is in MPI-2? • Do new technologies such as VIA, ST, and Infiniband have any impact on the one-sided specification? • They have impact on the implementation of both MPI-1 and MPI-2 • We are redesigning MPICH internals, one reason for which is to effectively use these new technologies • Not clear, but not likely, that they have impact on the one-sided specification

  11. What is in MPI-2? • Will the client/server/dynamic-process operations change the way large production supercomputing machines are used? • In the short run, no; large scheduled batch jobs will rule for a while. • In the long run, dynamic scheduling, distributed resource discovery, adaptive algorithms can use MPI-2 dynamic process ops.

  12. Which of the Runtime Tools Really Work? • Do we need large-scale interactive debuggers to work on thousands of nodes? • Yes, at least on 100 nodes • Some bugs, however, reveal themselves only on thousands of nodes • How far does Totalview scale?

  13. Which of the Runtime Tools Really Work? • Will the open-source software movement lead to improved runtime tools for large supercomputers? • Yes, it already has. Examples: MPI implementations, PBS, PVFS, ROMIO, others • Vendors may make tools open source

  14. Which of the Runtime Tools Really Work? • What new runtime tools can users expect to help improve the usability of large supercomputers? • One example: Jumpshot, a performance visualization tool developed at Argonne, is being designed to support multi-gigabyte trace files. (collaboration with IBM and Livermore)

  15. OS, Message Passing, and Runtime Tools Panel Rajeev Thakur Mathematics and Computer Science DivisionArgonne National Laboratory

More Related