Clustering

Clustering Types of Clustering

Objectives At the end of this module the student will understand the following tasks and concepts. • What clustering is and why you would want it • Clustering options • Differences between various types of clustering; advantages and disadvantages • Factors to consider when choosing a cluster type

What is a cluster? • My definition • Multiple systems performing a single function • Black box

Why Cluster? • Performance • Availability • Recoverability

Features • Speedup • Faster response times • Transactions finish faster • Scaleup • More work done • More capacity, more concurrent transactions • Scalability

Server Single Node Scaling • Scales to multiple CPUs • Doesn’t scale beyond one node • Multiple single points of failure Users Database Database

Cluster Definitions • Shared Nothing (Federated) • Replicated Site • Shared Disk • Failover • Active/Passive • Active/Active • Shared Everything

Shared Nothing Cluster • Only one CPU is connected to a disk • May have shared memory • MPP Systems are Shared Nothing • Other vendors have “Shared Nothing” clusters

Server Server Federated (Shared Nothing) Cluster • Distributed database (separate database on each machine) • Data is spread across nodes; each machine has part of the data • Function is spread across nodes • Two-Phase Commit Got it? 1. Good! 3. Got it! 2. Database Database

Server Server Replicated System • Data replicated at the server (network) level or at the storage (SAN) level • Multiple copies of the same database • Most common implementation is Active/Passive • Failover between nodes Passive Node Active Node Server level Replication or Storage level Replication Database Database

Shared Disk Cluster • Shared file system • Multiple systems attached to the same disk • All nodes must have access to data • Only one database instance; only one node has “ownership” of the shared disk • Synchronization between systems; If one node fails, then the other takes over

Cluster Interconnect • Most Shared Disk clusters require some form of Cluster Interconnect • Network – i.e. Gigabit Ethernet • Specialized – i.e. Infiniband, Myrinet • Most clusters implement a “heartbeat” between cluster nodes to monitor node health • Multiple nodes require a switch • Usually separated from the LAN • Some shared disk clusters implement a “heartbeat” mechanism to a quorum disk via the SAN in addition to/instead of network heartbeat • Oracle RAC implements Cache Fusion across the interconnect • Extra network traffic increases the throughput requirements • UDP implementation requires a separate network

Failover Cluster • One system is a standby system for another • Only one system doing work at a time • Pseudo-Shared Disk • Limited scalability in active/passive mode

Server Server Failover Clustering Users • Fault tolerant systems; highly available • Basic failover clusters don’t scale beyond two nodes Database Database

Active/Passive vs. Active/Active • Both are failover only • Active/Passive • One node is active • The other is passive until failover • Active/Active • Still uses active/passive technology • 2 separate databases • One is active on node A and passive on node B • The second database is active on node B and passive on node A. • Separate applications and user connections to each of the different databases

Active/Passive • Node A is active • Node B is passive until/unless Node A fails • Only one Oracle license is required Node A Node B

Active/Passive X Node A Node B If Node A fails …

Active/Passive • Node B becomes active • Node A is dead (definitely passive!) until repaired and then “failed back” if necessary. X Node A Node B

Active/Active • Application Group A and User Group A are activeon Node A • Application Group B and User Group B are activeon Node B • Each node serves as failover for the other. • 2 separate databases. Both nodes are not accessing the same data at the same time. • Oracle license required on each node Node A Node B Application A User Group A Application B User Group B Passive Fail-over for B Passive Fail-over for A

Switchover vs. Failover • Many cluster systems utilize the concept of Service Groups • Service Groups allow granular control of individual software packages (i.e. individual Oracle instances) • An individual group can be manually moved to another server without affecting other service groups – a “switchover” versus a “failover” • Adds greater management flexibility

Node A Node B Node C Node D Failover Application A User Group A Application D User Group D Application G User Group G Failover G X Application B User Group B Application E User Group E Application H User Group H Failover H Application C User Group C Application F User Group F Application I User Group I Failover I Failback N-to-1 Failover Configuration • Node D is a dedicated failover node for failures on Node A, B, and C • Extends number of active nodes • A problem is that once the failed node is available, the Service Groups on Node D (failover node) must failback to original server to restore High Availability

N + 1 Failover Configuration • Node D is a dedicated failover node for failures on Node A, B, and C • Extends number of active nodes • Once Node C is restored, it becomes the failover node, leaving Node D in production. Node A Node B Node C Node D Failover Application A User Group A Application D User Group D Application G User Group G Failover G X Application B User Group B Application E User Group E Application H User Group H Failover H Application C User Group C Application F User Group F Application I User Group I Failover I

N-to-N Failover Configuration • Node C fails, and its Service Groups are re-distributed across surviving nodes • Optimal solution for > 2 nodes • Implemented on third party failover clusters and Oracle RAC Node A Node B Node C Node D Failover G Failover H Failover I Application A User Group A Application D User Group D Application G User Group G Application J User Group J X Application B User Group B Application E User Group E Application H User Group H Application K User Group K Application C User Group C Application F User Group F Application I User Group I Application L User Group L

Third Party Clusters • Support for extended cluster nodes – up to 32 nodes for vendor Clustering • Supports N + 1 and N - N failover clustering • Integrated with hardware and/or software replication for long distance “clusters”

Clustering Solutions from Oracle • Oracle Failsafe • Oracle Data Guard • Advanced Replication • Shared Nothing Cluster • Oracle Parallel Server • Real Application Clustering (RAC)

Failsafe • MS Clustering Enabled • Two servers one disk subsystem • Switches in the event of a hardware failure • Requires recovery

Standby Database • Copy of Database (usually remote) • Kept up to date with Archive Logs • Oracle 8i feature • Oracle 9i-10g version of a standby database is Data Guard

Oracle Data Guard • Mirrored Server • Physical Standby • Archive Logs are applied to the remote database • Switchover occurs in the event of a failure • Logical Standby • Log Miner technology is used to generate SQL • Standby Database can also be used for read-only reporting • Advantages • Safe from user failure • Can be in different location • No recovery required

Advanced Replication • Uses Updatable-Snapshots • Replicates to another system • Systems stay in sync

Oracle Parallel Server • Shared disk cluster product • Loosely Coupled • Scalable performance • No downtime in the event of a system failure • Replaced by RAC in 9i

True Shared Disk Server (RAC) • ONE database • Separate multiple instances (processes & memory) • All nodes can access data simultaneously • Shared Everything Cluster • Transparent Application Failover • Oracle license required on each node • Highest level of cluster functionality Node A Node B

Factors to Consider for Clustering • Which do you need most? • High Availability – Failover Clusters, Synchronous Replication, Data Guard • Performance scalability – Active/Active failover clusters, N-to-N failover clusters • Both – Oracle RAC • Administration complexity • Failover clusters – relatively low • Oracle RAC – relatively high • Substantially less complex for 10g RAC than 9i RAC • Local or long distance? • Local – Failover, RAC • Remote – Federated database, Replication, Standby database/Data Guard • Oracle license costs • Active/Passive failover clusters – active nodes only • Active/Active failover clusters, RAC – per node

Review • What type of commit is required for a Federated (shared nothing) cluster? • What is the difference in how the database is kept up-to-date in Oracle Data Guard vs. Advanced Replication? • What is the difference between N-to-1 failover clusters and N + 1 failover clusters? • How many databases are there in an 8 node Oracle RAC cluster?

Summary • Types of clusters: • Shared Nothing Clusters • Federated databases • Replication • Shared Disk Clusters • Failover • Oracle RAC • Failover Clusters • Active/Passive • Active/Active • N-to-1 • N + 1 • N-to-N • Shared Everything Clusters • Oracle RAC • Choosing a cluster type involves trade-offs in functionality, costs, and administration complexity

Clustering

Clustering

Presentation Transcript

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering: Partition Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering