1 / 37

Cluster Extension for XP and EVA

Cluster Extension for XP and EVA. 2007. Dankwart Medger – Trident Consulting S.L. CLX/Cluster overview. Disaster Tolerant Design Considerations. Wide variety of interconnect options Regional or wide-area protection Support local to global Disaster Tolerant solutions.

armand
Download Presentation

Cluster Extension for XP and EVA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Extension for XP and EVA 2007 Dankwart Medger – Trident Consulting S.L.

  2. CLX/Cluster overview

  3. Disaster Tolerant Design Considerations • Wide variety of interconnect options • Regional or wide-area protection • Support local to global Disaster Tolerant solutions Protection level (Distance) Data Currency(Recovery Point Object) • Synchronous or asynchronous options available • Data consistency is always assured • Manual failover to secondary site • Fully automated failover with geographically dispersed clusters on HP-UX, Solaris, AIX,Linux, Windows Failover time (Recovery Time Object) • Asynchronous Continuous Access provides minimum latency across extended distances • Performance depends on bandwidthto remote data center Performancerequirements

  4. App A App B Cluster App A App B Server Clustering • Purpose • Protect against failures at the host level • Server failure • Some infrastructure failures • Automated failover • incl. necessary arbitration • Local distances • Limits • Does not protect against • Site disaster • Storage failure • Core infrastructure failure • A major disaster can mean full restore from tape • tapes should therefore be stored off-site

  5. WAN App A App B Cluster App A Array based replication App B Storage Replication • Purpose • Copy of your data in a remote site • In case of a major disaster on the primary site • no tape restore necessary • data still available on remote site • operation can be resumed on remote site • Long distances through FC Extension technologies and async replication technologies • Limits • Human intervention to resume operation on remote site • Standby system difficult to maintain

  6. The solution Cluster Extension/Metrocluster combines with to build a failover cluster spanning two data centers. • Benefits • Fully automated application failover even in case of site or storage failure • No manual intervention • No server reboots, no presentation changes, no SAN changes • Intelligent failover decision based on status checking and user settings • No simple failover script • Integrated into standard OS cluster solution • No change to how you manage your cluster today • Host IO limited to local array • Reducing intersite traffic, enabling long distance, low bandwidth setups the automated failover capabilities of a standard server cluster the remote replication capabilities of the EVA and XP

  7. Slide is animated Cluster Extension – the goal Arbitrator Node* App A App A App B Automated by CLX App A App A Continuous Access EVA/XP App B *type of arbitrator depends on cluster

  8. Automated failover solutionsavailability for all major platforms

  9. Cluster extension for Windows

  10. CLX Cluster integrationexample: CLX for Windows File Share • All Physical Disk resources of one Resource Group depend on a CLX resource • Very smooth integration Physical Disk Network Name IP Address Example taken from CLX EVA 26-Sep-14 HP confidential

  11. Cluster node – data center location Failover behavior setting CLX EVA Resource Parameters • DR Group for which the CLX resource is responsible for • All dependent disk resources (Vdisks) must belong to that DR Group • This field must contain the full DR Group name including the „\Data Replication\“ folder and is case sensitive EVA – data center location SMA – data center location Data concurrence settings SMI-S communication settings Pre/Post Exec Scripts

  12. CLX XP Resource Parameters • Device Group managed by this CLX resource • All dependent disk resources must belong to that Device Group XP arrays and Raidmanager Library Instances Fence Level settings Cluster node – data center location Pre/Post Exec Scripts Failover behavior setting CA resync setting

  13. Cluster Arbitration and CLX for Windows

  14. Local Microsoft Cluster – Shared Quorum disk Traditional MSCS uses Quorum App A • Shared application disks • store the application data App B • Shared Quorum disk • keep the quorum log • keep a copy of the cluster configuration • propagate registry checkpoints • arbitration, if LAN-connectivity is lost Quorum App A App B

  15. Challenges with dispersed MSCS • Managing data disks • Check data disk pairs on failover • Allow data disk failover only if current and consistent • Managing quorum disk (for a traditional shared quorum cluster) • Mirror quorum disk to remote disk array • Implement quorum disk pair and keep challenge/defense protocol working as if it is a single shared resource • Filter SCSI Reserve/Release/Reset and any necessary IO commands without performance impacts • Prevent split-brain phenomena

  16. App A App B Quorum App A App B Majority Node Set Quorum (1) • New Quorum mechanism introduced with Windows 2003 • Shared application disks • store the application data • Quorum data on local disk • used to keep a copy of the cluster configuration • synchronized by the Cluster Service • No common quorum log and no common cluster configuration available => changes to the cluster configuration are only allowed, when a majority of nodes is online and can communicate.

  17. Slide is animated Majority Node Set Quorum (2) • MNS arbitration rule: • In case of a failure, the cluster will survive, if a majority of nodes is still available • In case of a split site situation, the site with the majority will survive • Only nodes which belong to the majority are allowed to keep up the cluster service and could run applications. All others will shut down the cluster service. App A App B Quorum Quorum App A The majority is defined as:(<number of nodes configured in the cluster>/2) + 1 App B

  18. Slide is animated App B Majority Node Set Quorum (3) App A Quorum App A App B

  19. Slide is animated Majority Node Set Quorum (3) App A App A App B Quorum Quorum App A App B

  20. Majority Node Set Quorum (4)- File Share Witness What is it? A patch for Windows 2003 SP1 clusters provided by Microsoft (KB921181) What does it do? Allows the use of a simple file share to provide a vote for an MNS quorum-based 2-node cluster In addition to introducing the file share witness concept, this patch also introduces a configurable cluster heartbeat What are the benefits? The „arbitrator“ node is no longer a full cluster member. A simple file share can be used to provide this vote. No single subnet requirement for network connection to the arbitrator. One arbitrator can serve multiple clusters. However, you have to set up a separated share for each cluster. The abitrator exposing the share can be a standalone server a different OS architecture (e.g. a 32-bit Windows server providing a vote for a IA64 cluster) 20

  21. Majority Node Set Quorum (5)- File Share Witness Slide is animated Get vote \\arbitrator\share App A App A App B 1 with MNS fileshare witness App A App B 21

  22. Majority Node Set Quorum (6)- File Share Witness MNS Private Property: MNSFileShare = \\arbitrator\share2 MNS Private Property: MNSFileShare = \\arbitrator\share1 \\arbitrator\share1 \\arbitrator\share2 Cluster 2 Cluster1 22

  23. File Share Witness- Prerequisits Cluster Windows 2003 SP1 & R2 (x86, x64, IA64*, EE and DC) 2-node MNS quorum-based cluster Property will be ignored for >2 node clusters Arbitrator OS requirements Windows 2003 SP1 or later MS did not test earlier/other OS versions even though they should work Server OS is recommended for availability and security File Share requirements One file share for each cluster for which the arbitrator provides a vote 5 MB per share are sufficient The external share does not store the full state of the cluster configuration. Instead, the external share contains only data sufficient to help prevent split-brain syndrome and to help detect a partition-in-time Cluster Service account requires read/write permission For highest availability, you might want to create a clustered file share/file server • * There is no Windows Server 2003 R2 release for IA64 (Itanium) 24

  24. File Share Witness/Arbitrator- What does it mean for CLX? Remember: File Share Witness only works with 2-node clusters 26

  25. CLX XP Quorum Filter Service (QFS) • Component of CLX XP for Windows • Required for Windows 2000 • Optional for Windows 2003 • can also use MNS • QFS provides some benefits over MNS • Functionality • Allowing use of Microsoft share quorum cluster across two Data Center and XPs • Implements filter drivers that intercept quorum arbitration commands and uses additional CA pairs to make cross site decision • „external arbitrator“ for automated failover even in case of full site failure or split

  26. Slide is animated Quorum Quorum CTRL2 CTRL2 CTRL1 CTRL1 CTRL3 CTRL3 CLX XP on Windows – LAN split Quorum App A App B App B reserved byleft node reserved byleft node App A App B 26-Sep-14 HP confidential

  27. Slide is animated Quorum Quorum CTRL2 CTRL2 CTRL1 CTRL1 CTRL3 CTRL3 CLX XP on Windows – site failure External Arbitrator Quorum Quorum App A App A cdm App B reserved byleft node reserved byleft node reserved byright node App A App A App B 26-Sep-14 HP confidential

  28. Majority Node Set vs CLX XP Quorum Filter Service Recommended quorum mechanism for new installs

  29. Manual vs Automated failover – failover times

  30. Automated failover Question: „How long does a CLX failover take?“ Answer: „Depends !!!“ CLX failover is first of all still a cluster failover There are components influencing the total application failover time, which are out of control of CLX Failure recognition, cluster arbitration, application startup The CLX component of the total failover time also depends on many factors. 34

  31. Factors affecting the automated failover times in a CLX for Windows cluster Recognize failure Cluster arbitration CLX failover Start application on other node (5 sec – 5 min) 35

  32. Manual vs. automated failover times Typical example of a manual failover is a Stretched cluster Cluster stretched across two sites, but all nodes accessing the same array which replicates the data to a remote partner. Even in case of a node failure the primary storage array will be used (across a remote link if node is in remote data center) A storage or site failure will bring down the cluster requiring manual intervention to start cluster from remote array. Steps involved in case of a storage failure Notification of operator (15 min*) Evaluation of the situation and necessary recovery steps (30 min*) Shutdown surviving cluster node (5 min*) Replication failover (2 min*) Start up surviving cluster nodes (10 min*) The effort mulitplies with the number of application/clusters/arrays being affected *times are just examples and will vary depending on the situation and setup. A full site disaster for instance might involve much more troubleshooting and evalution time. 37

  33. Manual vs. automated failover times- single cluster Notification of operator Evaluation of the situation and necessary recovery steps Shutdown Failover Start Servers Total = 62 min* manual automated (CLX) Total = 10 min* Application startup Recognize failure CLX failover Cluster arbitrator time (min) 5 10 15 20 25 30 35 40 45 50 55 60 65 *times are just examples and will vary depending on the situation and setup 38

  34. Manual vs. automated failover times- multiple clusters Total = 96 min* cluster 1 manual cluster 2 cluster 3 cluster 1 automated (CLX) cluster 2 Total = 10 min* cluster 3 time (min) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 *times are just examples and will vary depending on the situation and setup 39

  35. Other advantage CLX helps to avoid human mistakes Manual failover operations introduce the risk of making mistakes, failing over the wrong DR Group, etc. CLX simplifies planned failover for maintenance Similar to disaster failover, just faster Manual failover still requires same steps besides the notification and evaluation Failback is as simple as a failover Once the primary site is restored, it‘s just another cluster failover Manual failback is as complex and intrusive as maintenance failover 40

  36. External information • Cluster Extension EVA • http://h18006.www1.hp.com/products/storage/software/ceeva/index.html • Cluster Extension XP • http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/cluster/index.html • Link to Metrocluster EVA • http://h71028.www7.hp.com/enterprise/cache/108988-0-0-0-121.html • Link to Metrocluster XP • http://h71028.www7.hp.com/enterprise/cache/4181-0-0-0-121.html • Disaster-Tolerant Solutions Library (Solutions for Serviceguard) • http://h71028.www7.hp.com/enterprise/cache/4190-0-0-0-121.html • Continental Cluster • http://h71028.www7.hp.com/enterprise/cache/4182-0-0-0-121.html • CA EVA • http://h18006.www1.hp.com/products/storage/software/conaccesseva/index.html • CA XP • http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/continuousaccess/index.html • CLX EVA migration whitepaper and Exchange replication whitepaper (incl. CLX) • http://h18006.www1.hp.com/storage/arraywhitepapers.html

More Related