1 / 49

Clustered Systems for Massive Parallelism

Clustered Systems for Massive Parallelism. N. Xiong Georgia State University. Review and Introduction. Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management

Download Presentation

Clustered Systems for Massive Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustered Systems for • Massive Parallelism N. XiongGeorgia State University

  2. Review and Introduction

  3. Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management Virtual Clustering and Resource Provisioning Homework Problems Chapter 04 Main Contents

  4. Scalability Packaging Control Homogeneity Security Design Objectives of Clustered Systems

  5. Design Objectives of Clustered Systems

  6. Fundamental Cluster Design Issues • Scalable Performance • Single System Image • Availability Support • Cluster Job Management • Internode Communication • Fault Tolerance and Recovery • Growth of Servers in HPC and HTC Systems

  7. Resource-Sharing in Cluster Systems

  8. An Idealized Cluster Architecture • Conventional databases and OLTP monitors offer users a desktop environment • Supports parallel programming based on standard languages and communication libraries • A user-interface subsystem combines the advantages of the Web interface and the windows GUI

  9. Node Architectures and System Packaging • Two types of cluster nodes • compute nodes • service nodes

  10. Compute Node Examples

  11. Modular Packaging of IBM BlueGene/L System

  12. Cluster System Interconnects

  13. High-Bandwidth Interconnects

  14. An InfiniBand Cluster Interconnection Network

  15. High-bandwidth Interconnects in Top-500 Systems

  16. Hardware, Software, and Middleware Support

  17. Design Principles of Clusters • Single-System-Image (SSI) Features • Single System • Single Control • Symmetry • Location Transparent

  18. Design Principles of Clusters • Single-System-Image Layers • Application Software Layer • Hardware or Kernel Layer • Middleware Layer

  19. Design Principles of Clusters • Single-System-Image Composition • Single Entry Point • Single File Hierarchy • Single I/O, Networking, and Memory Space • Other Desired SSI Features

  20. Single Entry Point

  21. Single File Hierarchy • It is persistent. • It is fault tolerant to some degree. • Network File System (NFS) and Andrew File System (AFS).

  22. Single File Hierarchy

  23. Single I/O, Networking, and Memory Space • Single Input/Output • Single Networking • Single Point of Control • Single Memory Space

  24. Single I/O, Networking, and Memory Space

  25. An Example

  26. Other Desired SSI Features • Single Job Management System • Single User Interface • Single Process Space

  27. Middleware Support for SSI Clustering

  28. High Availability Through Redundancy • Reliability • Availability • Serviceability

  29. Availability and Failure Rate

  30. Availability Values of Several Representative Systems

  31. Redundancy Techniques

  32. Fault-Tolerant Cluster Configurations • Hot Standby • Mutual Takeover • Fault-Tolerance

  33. Recovery Schemes • Backward recovery • Forward recovery: in real-time systems

  34. Checkpointing and Recovery Techniques • Kernel, Library, and Application Levels • Checkpoint Overheads • Choosing an Optimal Checkpoint Interval

  35. Checkpointing Parallel Programs

  36. Cluster Job Scheduling and Management • Cluster Job Management Issues • A user server • A job scheduler • A resource manager

  37. Cluster Job Types • Serial jobs • Parallel jobs • Interactive jobs • Batch jobs • Foreign jobs

  38. Multi-Job Scheduling Schemes

  39. Share Cluster Nodes • Dedicated Mode • Space Sharing • Time Sharing

  40. Migration Schemes Issues • Node Availability • Migration Overhead • Recruitment Threshold: • the amount of time a workstation stays unused before the cluster considers it an idle node

  41. Virtual Clustering and Resource Provisioning

  42. Five Virtual Cluster Research Projects

  43. Live VM Migration and Cluster Management

  44. Effect by Live Migration

  45. Dynamic Virtual Resource Provisioning

  46. Autonomic Adaptation of Virtual Environments

  47. Some References and Further Reading

  48. Homework Problems

  49. Homework Problems

More Related