580 likes | 763 Views
Multi-Site Clustering with Windows Server 2008 Enterprise. Symon Perriman Program Manager Microsoft Corporation WSV316 . Multi-Site Clustering. Benefits Deployment Replication Networking Faster Failover Quorum Best Practices. Benefits of a Multi-Site Cluster.
E N D
Multi-Site Clustering with Windows Server 2008 Enterprise SymonPerriman Program Manager Microsoft Corporation WSV316
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Benefits of a Multi-Site Cluster • Protects Against Loss of an Entire Datacenter • Power outage, fires, hurricanes, floods, earthquakes, terrorism • Automates Failover • Reduced downtime • Lower complexity of disaster recovery plan • Reduces Administrative Overhead • Automatically synchronize application and cluster changes • Easier to keep consistent than unclustered servers • What is the primary reason why disaster recovery solutions fail? Dependence on People
Multi-Site Clustering Checklist • http://technet.microsoft.com/en-us/library/dd197546.aspx • Organized multi-site cluster deployment guide
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
2+ physically separate sites 1+ node at each site Storage at each site with data replication Application moves during a failover Multi-Site Clustering Basics Site A Site B SAN SAN
Redundancy Everywhere • 2 or more computers (nodes) • 2 NICs • 3rd NIC for iSCSI • HBA • Fibre Channel (FC) • Serial Attached-SCSI (SAS) • Multipath IO (MPIO) • Redundant Storage Interconnects • Replicated Storage • OS, Service or Application HA Roles
Mix and Match Hardware • You Can Use Any Hardware Configuration if • Each component has a Windows Server 2008 / R2 logo • Servers, Storage, HBAs, MPIO, etc… • It passes Validate • It’s That Simple! • Connect your Windows Server 2008 / R2 logo’d hardware • Pass every test in Validate • It is now supported! • If you make a change, just run Validate again • Details: http://go.microsoft.com/fwlink/?LinkID=119949
FCCP • Failover Cluster Configuration Program • Windows Server 2008 / R2 • Buy validated solutions • “Validated by Microsoft Failover Cluster Configuration Program” • Not required for Microsoft support, must be logo’d • More information: http://www.microsoft.com/windowsserver2008/en/us/clustering-program.aspx
demo Introduction to Multi-Site Clustering
Cluster Validation and Replication • Multi-Site clusters are not required to pass the Storage tests to be supported • Validation guide and policy: • http://go.microsoft.com/fwlink/?LinkID=119949
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Why is Replication Needed? • Loss of a site won’t cause complete data loss • Data must exist on other site after a failover • Different storage needs than local clusters • Multiple storage arrays, independent on each site • Nodes usually access local site’s storage first Site B Site A Changes are made on Site A and replicated to Site B Replica
Replication Solutions • Replication Levels • Hardware (block level) storage-based replication • Software (file system level) host-based replication • Application-based replication • Exchange Server 2007 CCR • Replication Types • Synchronous • Asynchronous A data replication mechanism between sites is needed
Synchronous Replication • Host receives “write complete” response from the storage after the data is successfully written on both storage devices Replication WriteRequest SecondaryStorage WriteComplete Acknowledgement PrimaryStorage
Asynchronous Replication • Host receives “write complete” response from the storage after the data is successfully written to the primary storage device Replication WriteRequest SecondaryStorage WriteComplete PrimaryStorage
What About DFS-Replication? • DFS-R performs replication on file close • Some file types stay open for a very long time • VHDs for Virtual Machines • Databases for SQL Server • Data could be lost during a failover if it had not yet replicated Using DFS-R to replicate the cluster disk’s datain a multi-site Failover Cluster is not supported
IP Address Resources* Network Name Resource Disk Resource Custom Resource (manages replication) Resource Dependencies Group determines smallest unit of failover Resource Group Establishes start order timing Workload Resource (example File Server) “ depends on ”
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Site B Network Considerations • Cluster nodes can reside in different subnets (2008/R2) • No need to connect nodes with VLANs Public Network Site A 20.20.20.1 10.10.10.1 40.40.40.1 30.30.30.1 Separate Network
Stretching the Network • Longer distance means greater network latency • Too many missed health checks can cause false failover • Fully configurable in 2008/R2 • Failover Clustering has NO DISTANCE & NO SUBNET LIMITATIONS • Check if your vendor’s hardware / replication has limitations • SameSubnetDelay (default = 1 second) • Frequency heartbeats are sent • SameSubnetThreshold (default = 5 heartbeats) • Missed heartbeats before an interface is considered down • CrossSubnetDelay (default = 1 second) • Frequency heartbeats are sent to nodes on dissimilar subnets • CrossSubnetThreshold (default = 5 heartbeats) • Missed heartbeats before an interface is considered down to nodes on dissimilar subnets • Command Line: Cluster.exe /prop • PowerShell (R2): Get-Cluster | fl *
Security Over the WAN • Improved Security • Prevent Clients from Connecting to Networks • Encrypt Intra-cluster Traffic • 0 = clear text • 1 = signed (default) • 2 = encrypted
IP Address Resource B IP Address Resource A Network Name Resource Enhanced Dependencies – OR • Network Name resource stays up if either IP Address Resource A ORIP Address Resource B is up OR
IP Address Resources A IP Address Resources B Network Name Resource Custom App (replication) Disk Resource Workload Resource (example File Server) Resource Dependencies OR Comes online on site A Comes online on site B
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Nodes in dissimilar subnets Failover changes resource’s IP Address Clients need that new IP Address from DNS to reconnect DNS Updates DNS Server 2 DNS Server 1 DNS Replication Record Created Record Updated Record Obtained Record Updated 10.10.10.111 20.20.20.222 FS = 20.20.20.222 FS = 10.10.10.111 Site A Site B
Network Name Properties • RegisterAllProvidersIP (default = 0 for FALSE) • Determines if all IP Addresses for a Network Name will be registered by DNS • TRUE (1): IP Addresses can be online or offline and will still be registered • Ensure application is set to try all IP Addresses, so clients can come online quicker • HostRecordTTL (default = 1200 seconds) • Controls time the DNS record lives on client for a cluster network name • Shorter TTL: DNS records for clients updated sooner • Exchange Server 2007 recommends a value of five minutes (300 seconds)
Local Failover First • Local failover first • No change in IP Address • Cross-site failover for disaster recovery DNS Server 2 DNS Server 1 10.10.10.111 20.20.20.222 FS = 10.10.10.111 FS = 20.20.20.222 Site A Site B
Failover Order • Preferred Owners • Local failover first • Possible Owners Always Enforced • Resource will not start on non-possible owner • AntiAffinityClassNames • Groups with same AACN try to avoid moving to same node • http://msdn.microsoft.com/en-us/library/aa369651(VS.85).aspx
Virtual LAN (VLAN) • Deploying a VLAN minimizes client reconnection times • Can be harder to configure • Required for SQL & live migration DNS Server 2 DNS Server 1 10.10.10.111 10.10.10.111 VLAN FS = 10.10.10.111 Site A Site B
demo Multi-Site Clustering Groups and Settings
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Node majority Node and File Share majority Disk only (not recommended) Node and Disk majority Quorum Overview • Majority is greater than 50% • Possible Voters: • Nodes (1 each), Disk Witness (1 max), File Share Witness (1 max) • 4 Quorum Types Vote Vote Vote Vote Vote
Node and Disk Majority • Nodes get 1 vote each and Disk gets vote • Loss of disk or node OK if majority is maintained • Do not use in multi-site clusters unless directed by vendor Vote Vote Vote ? Replicated Storage from vendor
Site A Node Majority Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Site B SAN SAN Cross site network connectivity broken! Majority in Primary Site
Site A Node Majority Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership We are down! 5 Node Cluster: Majority = 3 Site B SAN SAN Disaster at Site 1 Majority in Primary Site
Forcing Quorum • Always understand why quorum was lost • Used to bring cluster online without quorum • Cluster starts in a special “forced” state • Once majority achieved, no more “forced” state • Command line: • net start clussvc /forcequorum (or /fq) • PowerShell (R2): • Start-ClusterNode –FixQuorum (or –fq)
Multi-Site With File Share Witness File Share Witness Site C Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Cluster1 WAN Site A Site B SAN SAN Replicated Storage from vendor
Multi-Site With File Share Witness File Share Witness Site C Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Cluster1 WAN Site A Site B SAN SAN Replicated Storage from vendor
Multi-Site With File Share Witness File Share Witness Site C Complete resiliency and automatic recovery from the loss of the File Share Witness \\Foo\Cluster1 WAN Site A Site B SAN SAN Replicated Storage from vendor
FSW Considerations • Simple Windows File Server • Needs to be in the same forest • Running Windows Server® 2003, 2008 or 2008 R2 • Recommended to be at 3rd separate site • Single file server can serve as a witness for multiple clusters • Each cluster requires its own share • Can be clustered in a second cluster • FSW cannot be on a node in the same cluster • It is an additional voter for free (almost)
demo Quorum on a Multi-Site Cluster
Quorum Model Summary • No Majority: Disk Only • Note Recommended • Only use as directed by vendor • Node and Disk Majority • Only use as directed by vendor • Node Majority • Odd number of nodes • Node and File Share Majority • Best availability solution • Recommended for • Exchange Server 2007 CCR
Multi-Site Clustering • Benefits • Deployment • Replication • Networking • Faster Failover • Quorum • Best Practices
Cluster your Branch Offices • Cluster several standalone File Servers from branch offices • Keep network traffic low • High-Availability for the files • Redundancy for the data Site A Site B Clients primarily accessing applications in Site A Clients primarily accessing applications in Site B
Multi-Site Across the Enterprise • More distributed cluster nodes & clusters gives higher availability • Complete resiliency and automatic failover • Remember your quorum model • Loss of any single site should not bring down the cluster • File Share Witness • 1 File Server hosts all File Share Witnesses for multiple clusters • Make it highly-available • Separate site • Not a node in that same cluster Cluster 2, Branch 1 Cluster 2, Branch 2 Cluster 2, Main Office Cluster 1, Site 2 Cluster 3, Many FSWs Cluster 1, Site 1
Multi-Site Clustering Review File Share Witness Site C 4, 6, 8… nodes + FSW = odd # votes Local failover first (preferred owner) Site failover second (possible owner) AntiAffinityClassNames Faster DNS Updates Register all IPs for a Network Name Shorten client’s DNS record TTL Ensure application tries all IPs WAN Site A Site B Encrypt WAN traffic for security Adjust health checks for latency Configure ‘OR’ dependencies SAN SAN Replicated Storage from vendor
Session Summary • Multi-Site Failover Clustering has many benefits • Variety of hardware options & configurations • Redundancy is needed everywhere • Understand your replication needs • Compare VLANs with multiple subnets • Plan your quorum model & nodes before deployment • Follow the checklist and best practices • http://technet.microsoft.com/en-us/library/dd197546.aspx