1 / 7

2.3 컴퓨터 클러스터의 설계 원칙

2.3 컴퓨터 클러스터의 설계 원칙. 2.3.1 Single-System Image Featues It means the illusion of a single system, single control, symmetry, and transparency. Single system: the entire cluster is viewed by users as one system that has multiple processors.

amir-clark
Download Presentation

2.3 컴퓨터 클러스터의 설계 원칙

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2.3 컴퓨터 클러스터의 설계 원칙 2.3.1 Single-System Image Featues • It means the illusion of a single system, single control, symmetry, and transparency. • Single system: the entire cluster is viewed by users as one system that has multiple processors. • Single control: Logically, an user or system user utilizes services from one place with a single interface. • Symmetry: All clusters services and functionalities are symmetric to all nodes and all users, except those protected by access rights. • Location-transparent: The user is not aware of the where about of the physical device that eventually provides a service. • Cluster nodes • home node • local node • remote nodes • The illusion of an SSI can be obtained at several layers: application software layer, hardware or kernel layer, middleware layer. Ch. 2-2 Computer Clusters

  2. Single Entry Point • The single entry point enables users to login to a cluster as one virtual host. • The system transparently distribute the user’s login and connection requests to different physical hosts to balance the load. • Realizing a Single Entry Point in a Cluster of Computers • Fig. 2.13 • Single File Hierarchy • From the view-point of any process, files can reside on three types of locations in a cluster, as shown in Fig. 2.14. • A stable storage requires two aspects: persistent, fault-tolerant. • Stable storage (global files) could be implemented as one centralized, large RAID disk. But it could also be distributed using local disks of cluster nodes. • Single I/O Space over Distributed RAID for I/O-Centric Clusters • Fig. 2.16 Ch. 2-2 Computer Clusters

  3. RAID 2.3.2 High Availability through Redundancy • When designing robust, high available systems three terms are often used together: reliability, availability, and serviceability (RAS). • 신뢰성: 시스템이 고장 없이 얼마나 오래 동작할 수 있는지를 측정 • 가용성: 시스템이 사용자에게 가용인 시간 백분율 • 서비스 가능성: 시스템을 서비스(유지, 보수, 업그레이드)하는 것이 얼마나 쉬운지를 말한다. Ch. 2-2 Computer Clusters

  4. Availability and Failure Rate • Availability=MTTF/(MTTF+MTTR) • MTTF (mean time to failure) • MTTR (mean time to repair) • Planned vs. Unplanned Failures • Transient vs. Permanent Failures • Partial vs. Total Failures • Single Point of failure in an SMP and in Clusters of Computers, Fig. 2.19. • Redundancy Techniques • Table 2.5 Availability of Computer System Types • Isolated Redundancy • When a component (the primary component) fails, the service it provided is take over the another component (the backup component). • The primary and the backup components should be isolated from each other. • Benefits • not a single point of failure • 고장 된 구성요소는 나머지 시스템이 작동 중 일 때, 수리될 수 있다. • 주된 구성요소와 백업 구성요소는 서로 테스트하고 디버거 할 수 있다. Ch. 2-2 Computer Clusters

  5. N-Version Programming to Enhance Software Reliability • The software is implemented by N isolated teams who may not even know the other exist. • Different teams are asked to implement the software using different algorithms, programming languages, environment tools, and even platform. • In a fault-tolerant system, the N versions all run simultaneously and their results are constantly compared. If the results differ, the system is notified that a fault has occurred. 2.3.3 Fault-Tolerant Cluster Configurations • Three ascending levels of availability • Hot standby server clusters • Active-takeover clusters • Failover cluster • 시스템 대체작동은 다수의 기능들: 고장 진단, 고장 공지, 고장 복구를 제공해야 한다. • Recovery Scheme • Backward recovery • Checkpoint • Rollback Ch. 2-2 Computer Clusters

  6. 2.4 클러스터 작업 및 자원 관리 2.4.1 Cluster Job Scheduling Methods • Cluster jobs may be scheduled to run at a specific time (calendar scheduling) or when a particular event happens (event scheduling). • Table 2.6 Job Scheduling Issues and Schemes for Cluster Nodes • Space Sharing • Multiple jobs can run on disjointed partitions of nodes simultaneously. • At most, a process is assigned to a node at a time. • Job Scheduling by Tiling over Cluster Nodes, Fig. 2.22 • Time Sharing • Independent scheduling (local scheduling) • Gang scheduling • The gang scheduling scheme schedules all processes of a parallel job together. • When one process is active, all processes are active. • Competition with foreign jobs Ch. 2-2 Computer Clusters

  7. 2.4.2 Cluster Job Management Systems • A Job Management System (JMS) should have three parts: • user server • job scheduler • resource manager: 자원 할당/감시, 스케줄링 정책 시행, 회계정보 수집 • JMS Administration • Cluster Job Types • Characteristics of a Cluster Workload • NAS 벤치마크 경험에 기초한 작업 부하 특성, p. 108 참조 • Migration Schemes • Node availability • Migration overhead • Recruitment threshold • The recruitment threshold is the amount of time a workstation stays unused before the cluster considers it an idle node. 2.4.3 Load Sharing Facility for Cluster Computing Ch. 2-2 Computer Clusters

More Related