240 likes | 269 Views
Sun Clusters Ira Pramanick Sun Microsystems, Inc. Outline. Today’s vs. tomorrow’s clusters How they are used today and how this will change Characteristics of future clusters Clusters as general-purpose platforms How they will be delivered Sun’s Full Moon architecture Summary & conclusions.
E N D
Outline • Today’s vs. tomorrow’s clusters • How they are used today and how this will change • Characteristics of future clusters • Clusters as general-purpose platforms • How they will be delivered • Sun’s Full Moon architecture • Summary & conclusions
LAN/WAN IP Switch IP Switch Clustering Today • Mostly for HA • Little sharing of resources • Exposed topology • Hard to use • Layered on OS • Reactive Solution
Global Networking Global Storage Clustering Tomorrow LAN/WAN Central Console
Sun Full Moon architecture • Turns clusters into general-purpose platforms • Cluster-wide file systems, devices, networking • Cluster-wide load-balancing and resource management • Integrated solution • HW, system SW, storage, applications, support/service • Embedded in Solaris 8 • Builds on existing Sun Cluster line • Sun Cluster 2.2 -> Sun Cluster 3.0
Characteristics of tomorrow’s clusters • High-availability • Cluster-wide resource sharing: files, devices, LAN • Flexibility & Scalability • Close integration with the OS • Load-balancing & Application management • Global system management • Integration of all parts: HW, SW, applications, support, HA guarantees
High Availability • End-to-end application availability • What matters: Applications as seen by network clients are highly-available • Enable Service Level Agreements • Failures will happen • SW, HW, operator errors, unplanned maintenance, etc. • Mask failures from applications as much as possible • Mask application failures from clients
High Availability... • No single point of failure • Use multiple components for HA & scalability • Need strong HA foundation integrated into OS • Node group membership, with quorum • Well-defined failure boundaries--no shared memory • Communication integrated with membership • Storage fencing • Transparently restartable services
High Availability... • Applications are the key • Most applications are not cluster-aware • Mask most errors from applications • Restart when node fails, with no recompile • Provide support for cluster-aware apps • Cluster APIs, fast communication • Disaster recovery • Campus-separation and geographical data replication
Resource Sharing • What is important to applications? • Ability to run on any node in cluster • Uniform global access to all storage and network • Standard system APIs • What to hide? • Hardware topology, disk interconnect, LAN adapters, hardwired physical names
Resource Sharing... • What is needed? • Cluster-wide access to existing file systems, volumes, devices, tapes • Cluster-wide access to LAN/WAN • Standard OS APIs: no application rewrite/recompile • Use SMP model • Apps run on machine (not “CPU 5, board 3, bus 2”) • Logical resource names independent of actual path
Resource Sharing... • Cluster-wide location-independent resource access • Run applications on any node • Failover/switchover apps to any node • Global job/work queues, print queues, etc. • Change/maintain hardware topology without affecting applications • But need not require fully-connected SAN • Main interconnect can be used through software support
Flexibility • Business needs change all the time • Therefore, platform must be flexible • System must be dynamic -- all done on-line • Resources can be added and removed • Dynamic reconfiguration of each node • Hot-plug in and out of IO, CPUs, memory, storage, etc. • Dynamic reconfiguration between nodes • More nodes, load-balancing, application reconfiguration
Scalability • Cluster SMP nodes • Choose nodes as big as needed to scale application • Need expansion room within nodes too • Don’t use clustering exclusively to scale applications • Interconnect speed slower than backplane speed • Few cluster-aware applications • Clustering large number of small nodes is like herding chicken
Close integration with OS • Currently: multi-CPU SMP support in OS • Does not make sense otherwise • Next step: cluster support in the OS • Next dimension of OS support: across nodes • Clustering will become part of the OS • Not a loosely-integrated layer
Advantages of OS integration • Ease of use • Same administration model, commands, installation • Availability • Integrated heartbeat, membership, fencing, etc. • Performance • In-kernel support, inter-node/process messaging, etc. • Leverage • All OS features/support available for clustering
Load-balancing • Load-balancing done at various levels • Built-in network load-balancing • For example, incoming http requests; TCP/IP bandwidth • Transactions at middleware level • Global job queues • All nodes have access to all storage and network • Therefore any node can be eligible to perform the work
Resource management • Cluster-wide resource management • CPU, network, interconnect, IO bandwidth • Cluster-wide application priorities • Global resource requirements guaranteed locally • Need per-node resource management • High-availability is not just making sure an application is started • Must guarantee resources to finish job
Global cluster management • System management • Perform administrative functions once • Maintain same model as single node • Same tools/commands as base OS--minimize retraining • Hide complexity • Most administrative operations should not deal with HW topology • But still enable low-level diagnostics and management
Servers Cluster OS Software Storage System Management Cluster Interconnect Middleware A Total Clustering Solution Applications Service and Support HA Guarantee Practice Integration of all components
Roadmap • Sun Cluster 2.2: currently shipping • Solaris 2.6, Solaris 7, Solaris 8 3/00 • 4 nodes • Year 2000 compliant • Choice of servers, storage, interconnects, topologies, networks • 10 Km separation • Sun Cluster 3.0 • External Alpha 6/99, Beta Q1 CY‘00, GA 2H CY‘00 • 8 nodes • Extensive set of new features: cluster fs, global devices, network load-balancing, new APIs (RGM), diskless application failover, SyMON integration
Wide Range of Applications • Agents developed, sold, and supported by Sun • Databases (Oracle, Sybase, Informix, Informix XPS), SAP • Netscape (http, news, mail, LDAP), Lotus Notes • NFS, DNS, Tivoli • Sold and supported by 3rd parties • IBM DB2 and DB2 PE, BEA Tuxedo • Agents developed and supported by Sun Professional Services • A large list, including many in-house applications • Toolkit for agent development • Application management API, training, Sun PS support
Embedded in Solaris 8 Built-in load balancing Single management console Global Networking Global resource management Global resource management Global application management Cluster APIs Wide-range of HW Global Storage Full Moon clustering Dynamic domains Globalfile system Global devices
Summary • Clusters as general-purpose platforms • Shift from reactive to proactive clustering solution • Clusters must be built on a strong foundation • Embed into a solid operating system • Full Moon -- bakes clustering technology into Solaris • Make clusters easy to use • Hide complexity, hardware details • Must be an integrated solution • From platform, service/support, to HA guarantees