540 likes | 1.03k Views
Clustered Systems for Massive Parallelism. N. Xiong Georgia State University. Review and Introduction. Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management
E N D
Clustered Systems for • Massive Parallelism N. XiongGeorgia State University
Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management Virtual Clustering and Resource Provisioning Homework Problems Chapter 04 Main Contents
Scalability Packaging Control Homogeneity Security Design Objectives of Clustered Systems
Fundamental Cluster Design Issues • Scalable Performance • Single System Image • Availability Support • Cluster Job Management • Internode Communication • Fault Tolerance and Recovery • Growth of Servers in HPC and HTC Systems
An Idealized Cluster Architecture • Conventional databases and OLTP monitors offer users a desktop environment • Supports parallel programming based on standard languages and communication libraries • A user-interface subsystem combines the advantages of the Web interface and the windows GUI
Node Architectures and System Packaging • Two types of cluster nodes • compute nodes • service nodes
Design Principles of Clusters • Single-System-Image (SSI) Features • Single System • Single Control • Symmetry • Location Transparent
Design Principles of Clusters • Single-System-Image Layers • Application Software Layer • Hardware or Kernel Layer • Middleware Layer
Design Principles of Clusters • Single-System-Image Composition • Single Entry Point • Single File Hierarchy • Single I/O, Networking, and Memory Space • Other Desired SSI Features
Single File Hierarchy • It is persistent. • It is fault tolerant to some degree. • Network File System (NFS) and Andrew File System (AFS).
Single I/O, Networking, and Memory Space • Single Input/Output • Single Networking • Single Point of Control • Single Memory Space
Other Desired SSI Features • Single Job Management System • Single User Interface • Single Process Space
High Availability Through Redundancy • Reliability • Availability • Serviceability
Fault-Tolerant Cluster Configurations • Hot Standby • Mutual Takeover • Fault-Tolerance
Recovery Schemes • Backward recovery • Forward recovery: in real-time systems
Checkpointing and Recovery Techniques • Kernel, Library, and Application Levels • Checkpoint Overheads • Choosing an Optimal Checkpoint Interval
Cluster Job Scheduling and Management • Cluster Job Management Issues • A user server • A job scheduler • A resource manager
Cluster Job Types • Serial jobs • Parallel jobs • Interactive jobs • Batch jobs • Foreign jobs
Share Cluster Nodes • Dedicated Mode • Space Sharing • Time Sharing
Migration Schemes Issues • Node Availability • Migration Overhead • Recruitment Threshold: • the amount of time a workstation stays unused before the cluster considers it an idle node