1 / 42

Clustering and Networking on Linux

Clustering and Networking on Linux. Kumaran Kalyanasundaram SGI Technology Center. Agenda. Types of clusters Examples Important technologies Available products Cluster interconnects Speeds and feeds Compute clusters Hardware layouts Programming considerations.

tyson
Download Presentation

Clustering and Networking on Linux

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering and Networking on Linux Kumaran Kalyanasundaram SGI Technology Center

  2. Agenda • Types of clusters • Examples • Important technologies • Available products • Cluster interconnects • Speeds and feeds • Compute clusters • Hardware layouts • Programming considerations

  3. Types of ClustersFunctional View • Availability clusters:For “mission-critical” apps • RAS features are essential requirements for all clusters • Throughput clusters: Run multiple jobs on nodes in the cluster • Mostly batch-style apps that are not “cluster-aware” • Scheduling, load-balancing • Capability clusters: Run a cluster-aware (HPC/MPI) job on multiple nodes

  4. Clustering for High Availability A collection of systems in a cluster, lends itself well to providing significantly higher availability when compared to a standalone system. • If one fails, move to the other! Significantly higher availability with moderate cost overhead • all systems actively engaged in the cluster workload

  5. What is High Availability? • When System or service are available almost all the time! • Resiliency from any single point of failure • Availability at or above 99.9% • accumulated unplanned outages less than 8 hrs / year • Minimized Downtime, services made available much before the broken component gets fixed. • Can hide planned downtime as well

  6. The High-Availability Environment • Member nodes of the HA cluster • Services/applications to be made highly available • Resources they depend upon • Primary node for each of these applications • Designated alternate node(s) in case of a failure on primary node

  7. Server A Server B Elements of HA Infrastructure(HW) Public Network (Ethernet, FDDI, ATM) Heartbeat changes A B Replicated Data configuration

  8. Elements of HA Infrastructure(HW) Public Network (Ethernet, FDDI, ATM) Heartbeat Server A Server B Reset Fibre Loop A Loop B A B Controller A Controller B Fibre RAID - Dual Hosted storage configuration

  9. Elements of HA Infrastructure(SW) • Heartbeat - short message exchange to monitor system health • HA Framework monitors common system level resources • Volumes, File systems, network interfaces • Application specific agents monitor application health • Cluster management tools

  10. In case of a failure • Failure detected by • storage processors within storage system • Internode communication failure • monitors detect resource / application malfunction

  11. In case of a failure - Recovery Steps • Failure notification to administrator • Storage access attempt via alternate path • Service and other necessary resource failover based on predefined failover policies • I/O fencing to prevent any possibility of data corruption from failing node

  12. Issues to consider • Application cleanup and restart time contribute directly to service downtime • False failovers due to timeouts Mechanisms to avoid it • Conservative Timeouts • Multiple monitoring methods • Shared vs. non-shared data access

  13. Linux and High availability • Today, • Linux very popular in Internet server space • Availability needs of a different flavor • Software packages addressing HA needs in this segment: • Watchdog (www.wizard.de) • RSF-1 from RSI (www.rsi.co.uk) • Understudy from polyserve(www.polyserve.com) • Some work going on in Linux community, related to Heartbeat etc • Red Hat 6.1

  14. Linux and High Availability As Linux matures, it is expected to make way into: Database servers File servers and Enterprise applications and more... Important Contributing Technologies • High Availability Framework • Enterprise class HA features • Cluster management - Single System View • Journalled Filesystem • SGI’s XFS going open-source!

  15. Clustered, Journalled Filesystem • Seamless sharing of data • No failover of data required • Near-local file system performance • Direct data channels between disks and nodes • Clustered XFS (CXFS) : • A shareable high-performance XFS file system • CXFS sits on top of XFS: Fast XFS features • A resilient file system • Failure of a node in the cluster does not prevent access to the disks from other nodes

  16. Internet server farm Web server farms Download servers Email server farms Load balancing switch Internet

  17. Internet server farmManageability of a cluster • Load sharing switch • Hybrid HA and Throughput solutions • Currently minimal feedback from backend to switch • Efforts in place to provide active feedback to switch • E.g. Cisco’s Dynamic Feedback Protocol • Performance monitoring • Cluster management - Single entity view

  18. Throughput clustersComputational workload Engineering/Marketing scenario analysis servers EDA server farms Render Farms Important Technologies • Cluster wide resource management • Performance monitoring • Accounting tools • Cluster management - Single System View

  19. Throughput clustersCluster resource management software Software to optimally distribute independent jobs on independent servers in a cluster • a.k.a. job scheduler, load sharing software … • Portable Batch System (PBS) • Open-source software • Developed at NASA • Jobs submitted specifying resource requirements • Jobs run when resources are available subject to constraints on maximum resource usage • Unified interface to all computing resources

  20. Linux Capability Clusters • Beowulf : A Beowulf parallel computer is a cluster of standard computer components, dedicated to use for parallel computation and running software that allows the system to be viewed as a unified parallel computer system. • Coarse-grain parallel computation and communication-based applications • Popular with academic and research communities • Breaking into commercial environments

  21. Beowulf background • Mileposts • 1994: Started with 486s and Ethernet • 1997: Caltech achieves 11 Gflops at SC’97 (140 CPUs) • 1999: Amerada Hess replaces SP2 with 32P Beowulf cluster of Pentium III’s • 1999: SGI shows 132p Beowulf cluster at Supercomputing 99 • Ohio Supercomputing deploys 132p Beowulf cluster from SGI

  22. Linux Capability Clusters • Motivation :Solving huge problems using commodity technologies. • Recent popularity because of technology availability : • Linux Operating System • Price/performance hardware • Killer Microprocessors • Killer Network Interface Cards (NIC) • Drivers and Usable Protocols • System and Job Management Software • Parallel Algorithm Maturation • Cluster Application Availability

  23. Cluster friendly applications • Weather analysis : MM5, ARPS • Engineering analysis :CFD, NVH, Crash • Bioinformatics and Sciences • GAMESS • Zues-MP: Pure Hydro • AS-PCG: 2D Navier-Stokes • Cactus: 3-D Einstein GR Equations • QMC: Quantum Monte Carlo • MILC: QCD

  24. Linux Capability Clusters • Important technologies: • Parallel programming environment • MPI : Widely-supported, highly detailed specification of a standard C and Fortran interface for message-passing parallel programming • Parallel Debugger • Totalview from Etnus • Fast interconnects • Commodity or special-purpose NICs • OS bypass implementations of protocols

  25. SGI™ 1400 Beowulf Cluster @ OSC Configuration: User Community Head Node MPI Myrinet Bypass Interconnect (Myrinet) 32 servers

  26. SGI Beowulf ClusterSample Config

  27. Interconnect considerations • Latency: • Key Measure : Round Trip Time measured using the API(e.g. MPI), not hardware latency Bandwidth: Measured using the API CPU Overhead: • How much of API/Protocol is buried in the NIC

  28. Interconnect technologyAPI to Adapter • Cluster API Support, Protocols, Network Drivers and Interfaces: • MPI-LAM / TCP-IP / Fast Ethernet • MPICH / GM / Myrinet • MPI/Pro / VIA / Giganet • SGI-MPI / ST / GSN : Currently SGI MIPS only • GSN: Gigabyte System Network • ST: Schedule Transfer (Protocol)

  29. Cluster Interconnects • Network Hardware • Fast Ethernet • Gigabit Ethernet • Myrinet • GiganetTM • GSN • Choice of network is highly application dependent

  30. Scheduled Transfer Protocol(STP) • Transaction based communication protocol for low latency system area networking • Extension of DMA to the network • Low CPU utilization and low latency • Data link layer independent • Standard specifies encapsulation for Ethernet, GSN, ATM ...

  31. Cluster Interconnect Comparison1 Gbps range

  32. Myrinet • High performance packet switching interconnect technology for clusters • Bandwidth upto 1.1 Gb/s • Myricom supported GM is provided for NT/Linux/Solaris/Tru64/VxWorks • Supports MPI • Low CPU and latency(MPI--> 18us)

  33. Really Fast Interconnects!API to Adapter • MPI/ST/GSN • The technology used on the ASCI Blue Mountain and other ultra-high-end clusters • ST Protocol : Light-weight ANSI standard protocol specialized for high-performance clusters • We are working hard to bring ST to Linux and Gigabit Ethernet.

  34. What Is GSN? • ANSI standard interconnect • Highest bandwidth and lowest latency interconnect standard • Gigabyte-per-second links, switches, adapters • Provides full duplex dual, 6.4 Gbps (800MB/s) of error-free, flow controlled data • Multi-vendor, multi-protocol interoperability • IBM, Compaq, and others to provide full connectivity

  35. Compute cluster components Nodes • Node Width: ?-way SMP • Special Serves Nodes: • I/O Nodes • Visualization • Front End • Cluster Management • Node Qualities: • Form Factor • Remote Monitoring • I/O Performance

  36. Thin Node Cluster CPU CPU Switch X I/O I/O X CPU CPU CPU CPU X I/O I/O X CPU CPU X: Internal Bus or Crossbar

  37. Scalable Hybrid Architecture:Fat Node Cluster CPU CPU CPU CPU X I/O S w i t c h I/O X CPU CPU CPU CPU CPU CPU CPU CPU X I/O I/O X CPU CPU CPU CPU X: Internal Bus, Crossbar or Scalable Interconnect

  38. Scalable Hybrid Architecture: • Scalable Internal Interconnect • Scalable External Network • Switch • Multiple External Network Devices • Cluster API: MPI • Use Internal Interconnect ONLY on communication within a node • Support Use of Multiple External Network devices • Multiple Threads Communicating Increases Message Passing Bandwidth.

  39. Fat Node Advantages • Larger Shared Memory environment: • More Applications • Higher Performance Potential • Shared Memory Latencies/Bandwidths • Add parallelism to supporting applications • Easier System Administration • Less complex, Fatter network: • Fewer wires, higher bandwidth • Require MPI mixed mode support

  40. Hybrid Programming Models • Parallelism Granularity and Implementation Layers for Hybrid MPI/OpenMP • OpenMP -> MPI --> MPI/OpenMP • Automatic Parallelization: Based on OpenMP Library • OpenMP: Loop Level: Fine Grain • OpenMP: Threaded Implementations: Course Grain • MPI: Course Grain • MPI/OpenMP: Course Grain/Fine Grain

  41. SGI Advanced Cluster Environment (ACE)Comprehensive compute cluster package • Programming Environment • Load Sharing and Scheduling Tools • Administration Tools - Single System View • Performance Management Tools • Interconnect drivers • Cluster-wide accounting • Shared File System • High Availability Framework • Professional/Managed Services

  42. Summary • Linux clusters provide • A wide range of computing options • High Availability • Throughput • Capacity • Flexibility • Price/performance • Expandability • ‘Best solution’ requires integration of commodity systems, open source solutions and specific value-add components

More Related