1 / 64

Cluster Computing: An Introduction

Cluster Computing: An Introduction. 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw. Clusters Have Arrived. What is a Cluster?. A collection of independent computer systems working together as if a single system Coupled through a scalable, high bandwidth, low latency interconnect

aman
Download Presentation

Cluster Computing: An Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Computing:An Introduction 金仲達 國立清華大學資訊工程學系 king@cs.nthu.edu.tw

  2. Clusters Have Arrived

  3. What is a Cluster? • A collection of independent computer systems working together as if a single system • Coupled through a scalable, high bandwidth, low latency interconnect • The nodes can exist in a single cabinet or be separated and connected via a network • Faster, closer connection than a network (LAN) • Looser connection than a symmetric multiprocessor

  4. Outline • Motivations of Cluster Computing • Cluster Classifications • Cluster Architecture & its Components • Cluster Middleware • Representative Cluster Systems • Task Forces on Cluster • Resources and Conclusions

  5. Motivations ofCluster Computing

  6. How to Run Applications Faster ? • There are three ways to improve performance: • Work harder • Work smarter • Get help • Computer analogy • Use faster hardware: e.g. reduce the time per instruction (clock cycle) • Optimized algorithms and techniques • Multiple computers to solve problem=> techniques of parallel processing is mature and can be exploited commercially

  7. Motivation for Using Clusters • Performance of workstations and PCs is rapidly improving • Communications bandwidth between computers is increasing • Vast numbers of under-utilized workstations with a huge number of unused processor cycles • Organizations are reluctant to buy large, high performance computers, due to the high cost and short useful life span

  8. Motivation for Using Clusters • Workstation clusters are thus a cheap and readily available approach to high performance computing • Clusters are easier to integrate into existing networks • Development tools for workstations are mature • Threads, PVM, MPI, DSM, C, C++, Java, etc. • Use of clusters as a distributed compute resource is cost effective --- incremental growth of system!!! • Individual node performance can be improved by adding additional resource (new memory blocks/disks) • New nodes can be added or nodes can be removed • Clusters of Clusters and Metacomputing

  9. Key Benefits of Clusters • High performance:running cluster enabled programs • Scalability:adding servers to the cluster or by adding more clusters to the network as the need arises or CPU to SMP • High throughput • System availability (HA):offer inherent high system availability due to the redundancy of hardware, operating systems, and applications • Cost-effectively

  10. Why Cluster Now?

  11. Hardware and Software Trends • Important advances taken place in the last five year • Network performance increased with reduced cost • Workstation performance improved • Average number of transistors on a chip grows 40% per year • Clock frequency growth rate is about 30% per year • Expect 700-MHz processors with 100M transistors in early 2000 • Availability of powerful and stable operating systems (Linux, FreeBSD) with source code access

  12. Why Clusters NOW? • Clusters gained momentum when three technologies converged: • Very high performance microprocessors • workstation performance = yesterday supercomputers • High speed communication • Standard tools for parallel/ distributed computing & their growing popularity • Time to market => performance • Internet services: huge demands for scalable, available, dedicated internet servers • big I/O, big compute

  13. Efficient Communication • The key enabling technology:from killer micro to killer switch • Single chip building block forscalable networks • high bandwidth • low latency • very reliable • Challenges for clusters • greater routing delay and less thancomplete reliability • constraints on where the networkconnects into the node • UNIX has a rigid device andscheduling interface

  14. Putting Them Together ... • Building block = complete computers(HW & SW) shipped in 100,000s:Killer micro, Killer DRAM, Killer disk,Killer OS, Killer packaging, Killer investment • Leverage billion $ per year investment • Interconnecting building blocks => Killer Net • High bandwidth • Low latency • Reliable • Commodity (ATM, Gigabit Ethernet, MyridNet)

  15. Windows of Opportunity • The resources available in the average clusters offer a number of research opportunities, such as • Parallel processing: use multiple computers to build MPP/DSM-like system for parallel computing • Network RAM: use the memory associated with each workstation as an aggregate DRAM cache • Software RAID: use the arrays of workstation disks to provide cheap, highly available, and scalable file storage • Multipath communication: use the multiple networks for parallel data transfer between nodes

  16. Windows of Opportunity • Most high-end scalable WWW servers are clusters • end services (data, web, enhanced information services, reliability) • Network mediation services also cluster-based • Inktomi traffic server, etc. • Clustered proxy caches, clustered firewalls, etc. • => These object web applications increasingly compute intensive • => These applications are an increasing part of the “scientific computing”

  17. Classification ofCluster Computers

  18. Clusters Classification 1 • Based on Focus (in Market) • High performance (HP) clusters • Grand challenging applications • High availability (HA) clusters • Mission critical applications

  19. HA Clusters

  20. Clusters Classification 2 • Based on Workstation/PC Ownership • Dedicated clusters • Non-dedicated clusters • Adaptive parallel computing • Can be used for CPU cycle stealing

  21. Clusters Classification 3 • Based on Node Architecture • Clusters of PCs (CoPs) • Clusters of Workstations (COWs) • Clusters of SMPs (CLUMPs)

  22. Clusters Classification 4 • Based on Node Components Architecture & Configuration: • Homogeneous clusters • All nodes have similar configuration • Heterogeneous clusters • Nodes based on different processors and running different OS

  23. Clusters Classification 5 • Based on Levels of Clustering: • Group clusters (# nodes: 2-99) • A set of dedicated/non-dedicated computers --- mainly connected by SAN like Myrinet • Departmental clusters (# nodes: 99-999) • Organizational clusters (# nodes: many 100s) • Internet-wide clusters = Global clusters(# nodes: 1000s to many millions) • Metacomputing

  24. Clusters and TheirCommodity Components

  25. Cluster Computer Architecture

  26. Cluster Components...1aNodes • Multiple high performance components: • PCs • Workstations • SMPs (CLUMPS) • Distributed HPC systems leading to Metacomputing • They can be based on different architectures and running different OS

  27. Cluster Components...1bProcessors • There are many (CISC/RISC/VLIW/Vector..) • Intel: Pentiums, Xeon, Merced…. • Sun: SPARC, ULTRASPARC • HP PA • IBM RS6000/PowerPC • SGI MPIS • Digital Alphas • Integrating memory, processing and networking into a single chip • IRAM (CPU & Mem): (http://iram.cs.berkeley.edu) • Alpha 21366 (CPU, Memory Controller, NI)

  28. Cluster Components…2OS • State of the art OS: • Tend to be modular: can easily be extended and new subsystem can be added without modifying the underlying OS structure • Multithread has added a new dimension to parallel processing • Popular OS used on nodes of clusters: • Linux (Beowulf) • Microsoft NT (Illinois HPVM) • SUN Solaris (Berkeley NOW) • IBM AIX (IBM SP2) • …..

  29. Cluster Components…3High Performance Networks • Ethernet (10Mbps) • Fast Ethernet (100Mbps) • Gigabit Ethernet (1Gbps) • SCI (Dolphin - MPI- 12 usec latency) • ATM • Myrinet (1.2Gbps) • Digital Memory Channel • FDDI

  30. Cluster Components…4Network Interfaces • Dedicated Processing power and storage embedded in the Network Interface • An I/O card today • Tomorrow on chip? Mryicom Net 160 MB/s Myricom NIC M P I/O bus (S-Bus) 50 MB/s M $ Sun Ultra 170 P

  31. Cluster Components…4Network Interfaces • Network interface card • Myrinet has NIC • User-level access support: VIA • Alpha 21364 processor integrates processing, memory controller, network interface into a single chip..

  32. Cluster Components…5 Communication Software • Traditional OS supported facilities (but heavy weight due to protocol processing).. • Sockets (TCP/IP), Pipes, etc. • Light weight protocols (user-level): minimal Interface into OS • User must transmit directly into and receive from the network without OS intervention • Communication protection domains established by interface card and OS • Treat message loss as an infrequent case • Active Messages (Berkeley), Fast Messages (UI), ...

  33. Cluster Components…6aCluster Middleware • Resides between OS and applications and offers an infrastructure for supporting: • Single System Image (SSI) • System Availability (SA) • SSI makes collection of computers appear as a single machine (globalized view of system resources) • SA supports check pointing and process migration, etc.

  34. Cluster Components…6bMiddleware Components • Hardware • DEC Memory Channel, DSM (Alewife, DASH) SMP techniques • OS/gluing layers • Solaris MC, Unixware, Glunix • Applications and Subsystems • System management and electronic forms • Runtime systems (software DSM, PFS etc.) • Resource management and scheduling (RMS): • CODINE, LSF, PBS, NQS, etc.

  35. Cluster Components…7aProgramming Environments • Threads (PCs, SMPs, NOW, ..) • POSIX Threads • Java Threads • MPI • Linux, NT, on many Supercomputers • PVM • Software DSMs (Shmem)

  36. Cluster Components…7bDevelopment Tools? • Compilers • C/C++/Java/ • RAD (rapid application development tools):GUI based tools for parallel processing modeling • Debuggers • Performance monitoring and analysis tools • Visualization tools

  37. Cluster Components…8Applications • Sequential • Parallel/distributed (cluster-aware applications) • Grand challenging applications • Weather Forecasting • Quantum Chemistry • Molecular Biology Modeling • Engineering Analysis (CAD/CAM) • ………………. • Web servers, data-mining

  38. Cluster Middleware and Single System Image

  39. Middleware Design Goals • Complete transparency • Let users see a single cluster system • Single entry point, ftp, telnet, software loading... • Scalable performance • Easy growth of cluster • no change of API and automatic load distribution • Enhanced availability • Automatic recovery from failures • Employ checkpointing and fault tolerant technologies • Handle consistency of data when replicated..

  40. Single System Image (SSI) • A single system image is the illusion, created by software or hardware, that a collection of computers appear as a single computing resource • Benefits: • Usage of system resources transparently • Improved reliability and higher availability • Simplified system management • Reduction in the risk of operator errors • User need not be aware of the underlying system architecture to use these machines effectively

  41. Desired SSI Services • Single entry point • telnet cluster.my_institute.edu • telnet node1.cluster.my_institute.edu • Single file hierarchy: AFS, Solaris MC Proxy • Single control point: manage from single GUI • Single virtual networking • Single memory space - DSM • Single job management: Glunix, Condin, LSF • Single user interface: like workstation/PC windowing environment

  42. Application and Subsystem Level Operating System Kernel Level SSI Levels • Single system support can exist at different levels within a system, one is able to be built on another Hardware Level

  43. Availability Support Functions • Single I/O space (SIO): • Any node can access any peripheral or disk devices without the knowledge of physical location. • Single process space (SPS) • Any process can create processes on any node, and they can communicate through signals, pipes, etc, as if they were one a single node • Checkpointing and process migration • Saves the process state and intermediate results in memory or disk; process migration for load balancing • Reduction in the risk of operator errors

  44. Relationship among Middleware Modules

  45. Strategies for SSI • Build as a layer on top of existing OS (e.g. Glunix) • Benefits: • Makes the system quickly portable, tracks vendor software upgrades, and reduces development time • New systems can be built quickly by mapping new services onto the functionality provided by the layer beneath, e.g. Glunix/Solaris-MC • Build SSI at the kernel level (True Cluster OS) • Good, but can’t leverage of OS improvements by vendor • e.g. Unixware and Mosix (built using BSD Unix)

  46. Representative Cluster Systems

  47. Research Projects of Clusters • Beowulf: CalTech, JPL, and NASA • Condor: Wisconsin State University • DQS (Distributed Queuing System): Florida State U. • HPVM (High Performance Virtual Machine): UIUC& UCSB • Gardens: Queensland U. of Technology, AU • NOW (Network of Workstations): UC Berkeley • PRM (Prospero Resource Manager): USC

  48. Commercial Cluster Software • Codine (Computing in Distributed Network Environment): GENIAS GmbH, Germany • LoadLeveler: IBM Corp. • LSF (Load Sharing Facility): Platform Computing • NQE (Network Queuing Environment): Craysoft • RWPC: Real World Computing Partnership, Japan • Unixware: SCO • Solaris-MC: Sun Microsystems

  49. Comparison of 4 Cluster Systems

  50. Task Forceson Cluster Computing

More Related