1 / 24

Sun Clusters Ira Pramanick Sun Microsystems, Inc.

Sun Clusters Ira Pramanick Sun Microsystems, Inc. Outline. Today’s vs. tomorrow’s clusters How they are used today and how this will change Characteristics of future clusters Clusters as general-purpose platforms How they will be delivered Sun’s Full Moon architecture Summary & conclusions.

pboatwright
Download Presentation

Sun Clusters Ira Pramanick Sun Microsystems, Inc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sun ClustersIra PramanickSun Microsystems, Inc.

  2. Outline • Today’s vs. tomorrow’s clusters • How they are used today and how this will change • Characteristics of future clusters • Clusters as general-purpose platforms • How they will be delivered • Sun’s Full Moon architecture • Summary & conclusions

  3. LAN/WAN IP Switch IP Switch Clustering Today • Mostly for HA • Little sharing of resources • Exposed topology • Hard to use • Layered on OS • Reactive Solution

  4. Global Networking Global Storage Clustering Tomorrow LAN/WAN Central Console

  5. Sun Full Moon architecture • Turns clusters into general-purpose platforms • Cluster-wide file systems, devices, networking • Cluster-wide load-balancing and resource management • Integrated solution • HW, system SW, storage, applications, support/service • Embedded in Solaris 8 • Builds on existing Sun Cluster line • Sun Cluster 2.2 -> Sun Cluster 3.0

  6. Characteristics of tomorrow’s clusters • High-availability • Cluster-wide resource sharing: files, devices, LAN • Flexibility & Scalability • Close integration with the OS • Load-balancing & Application management • Global system management • Integration of all parts: HW, SW, applications, support, HA guarantees

  7. High Availability • End-to-end application availability • What matters: Applications as seen by network clients are highly-available • Enable Service Level Agreements • Failures will happen • SW, HW, operator errors, unplanned maintenance, etc. • Mask failures from applications as much as possible • Mask application failures from clients

  8. High Availability... • No single point of failure • Use multiple components for HA & scalability • Need strong HA foundation integrated into OS • Node group membership, with quorum • Well-defined failure boundaries--no shared memory • Communication integrated with membership • Storage fencing • Transparently restartable services

  9. High Availability... • Applications are the key • Most applications are not cluster-aware • Mask most errors from applications • Restart when node fails, with no recompile • Provide support for cluster-aware apps • Cluster APIs, fast communication • Disaster recovery • Campus-separation and geographical data replication

  10. Resource Sharing • What is important to applications? • Ability to run on any node in cluster • Uniform global access to all storage and network • Standard system APIs • What to hide? • Hardware topology, disk interconnect, LAN adapters, hardwired physical names

  11. Resource Sharing... • What is needed? • Cluster-wide access to existing file systems, volumes, devices, tapes • Cluster-wide access to LAN/WAN • Standard OS APIs: no application rewrite/recompile • Use SMP model • Apps run on machine (not “CPU 5, board 3, bus 2”) • Logical resource names independent of actual path

  12. Resource Sharing... • Cluster-wide location-independent resource access • Run applications on any node • Failover/switchover apps to any node • Global job/work queues, print queues, etc. • Change/maintain hardware topology without affecting applications • But need not require fully-connected SAN • Main interconnect can be used through software support

  13. Flexibility • Business needs change all the time • Therefore, platform must be flexible • System must be dynamic -- all done on-line • Resources can be added and removed • Dynamic reconfiguration of each node • Hot-plug in and out of IO, CPUs, memory, storage, etc. • Dynamic reconfiguration between nodes • More nodes, load-balancing, application reconfiguration

  14. Scalability • Cluster SMP nodes • Choose nodes as big as needed to scale application • Need expansion room within nodes too • Don’t use clustering exclusively to scale applications • Interconnect speed slower than backplane speed • Few cluster-aware applications • Clustering large number of small nodes is like herding chicken

  15. Close integration with OS • Currently: multi-CPU SMP support in OS • Does not make sense otherwise • Next step: cluster support in the OS • Next dimension of OS support: across nodes • Clustering will become part of the OS • Not a loosely-integrated layer

  16. Advantages of OS integration • Ease of use • Same administration model, commands, installation • Availability • Integrated heartbeat, membership, fencing, etc. • Performance • In-kernel support, inter-node/process messaging, etc. • Leverage • All OS features/support available for clustering

  17. Load-balancing • Load-balancing done at various levels • Built-in network load-balancing • For example, incoming http requests; TCP/IP bandwidth • Transactions at middleware level • Global job queues • All nodes have access to all storage and network • Therefore any node can be eligible to perform the work

  18. Resource management • Cluster-wide resource management • CPU, network, interconnect, IO bandwidth • Cluster-wide application priorities • Global resource requirements guaranteed locally • Need per-node resource management • High-availability is not just making sure an application is started • Must guarantee resources to finish job

  19. Global cluster management • System management • Perform administrative functions once • Maintain same model as single node • Same tools/commands as base OS--minimize retraining • Hide complexity • Most administrative operations should not deal with HW topology • But still enable low-level diagnostics and management

  20. Servers Cluster OS Software Storage System Management Cluster Interconnect Middleware A Total Clustering Solution Applications Service and Support HA Guarantee Practice Integration of all components

  21. Roadmap • Sun Cluster 2.2: currently shipping • Solaris 2.6, Solaris 7, Solaris 8 3/00 • 4 nodes • Year 2000 compliant • Choice of servers, storage, interconnects, topologies, networks • 10 Km separation • Sun Cluster 3.0 • External Alpha 6/99, Beta Q1 CY‘00, GA 2H CY‘00 • 8 nodes • Extensive set of new features: cluster fs, global devices, network load-balancing, new APIs (RGM), diskless application failover, SyMON integration

  22. Wide Range of Applications • Agents developed, sold, and supported by Sun • Databases (Oracle, Sybase, Informix, Informix XPS), SAP • Netscape (http, news, mail, LDAP), Lotus Notes • NFS, DNS, Tivoli • Sold and supported by 3rd parties • IBM DB2 and DB2 PE, BEA Tuxedo • Agents developed and supported by Sun Professional Services • A large list, including many in-house applications • Toolkit for agent development • Application management API, training, Sun PS support

  23. Embedded in Solaris 8 Built-in load balancing Single management console Global Networking Global resource management Global resource management Global application management Cluster APIs Wide-range of HW Global Storage Full Moon clustering Dynamic domains Globalfile system Global devices

  24. Summary • Clusters as general-purpose platforms • Shift from reactive to proactive clustering solution • Clusters must be built on a strong foundation • Embed into a solid operating system • Full Moon -- bakes clustering technology into Solaris • Make clusters easy to use • Hide complexity, hardware details • Must be an integrated solution • From platform, service/support, to HA guarantees

More Related