1 / 33

Multi-Site Clustering for Hyper-V Disaster Recovery

Multi-Site Clustering for Hyper-V Disaster Recovery. Greg Shields, MVP, vExpert Senior Partner Concentrated Technology. www.ConcentratedTech.com @ ConcentratdGreg. About the speaker. Over 15 years of Windows experience.

shay
Download Presentation

Multi-Site Clustering for Hyper-V Disaster Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Site Clusteringfor Hyper-VDisaster Recovery Greg Shields, MVP, vExpertSenior PartnerConcentrated Technology www.ConcentratedTech.com @ConcentratdGreg

  2. About the speaker Over 15 years of Windows experience • Administrator– Managed environments ranging from a few dozen to many thousands of users… • Consultant – Hands-on and Strategic… • Speaker – TechMentor, Tech Ed, Windows Connections, MMS, VMworld, ISACA, others… • Analyst/Author – Fourteen books and counting… • Columnist – TechNet Magazine, Redmond Magazine,Windows IT Pro Magazine, TechTarget Online, others… • All-around good guy…

  3. What Makes a Disaster? Which of the following would you consider a disaster? • Impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease • Interrupts the functionality of your datacenter for an extended period of time • It’s immediately ceasing all processing on that server • It causes problems with a service, shutting down that service and preventing some action from occurring on the server It causes a server or an entire rack of servers to inadvertently and rapidly power down

  4. What Makes a Disaster? Which of the following would you consider a disaster? • It’s immediately ceasing all processing on that server Just a bad day… • It causes problems with a service, shutting down that service and preventing some action from occurring on the server It causes a server or an entire rack of servers to inadvertently and rapidly power down

  5. What Makes a Disaster? • Your decision to “declare a disaster” and move to “disaster ops” is a major one • The technologies used for disaster protection are different than those used for high-availability • More complex • More expensive • Failover and failback processes involve more thought • You might not be able to just “fail back” with a click of a button

  6. Multi-Site Hyper-V == Single-Site Hyper-V Multi-site Hyper-V looks very much the same as single-site Hyper-V • Microsoft has not done a good job of explaining this fact! • Some Hyper-V hosts • Some networking and storage • Virtual machines that Live Migrate around But there are some major differences too… • VMs can Live Migrate across sites • Sites typically have different subnet arrangements • Data in the primary site must be replaced with the DR site • Clients need to know where your servers go!

  7. Constructing Site-Proof Hyper-V: Three Things At a very high level, Hyper-V disaster recovery is three things • Once you have these three things, layering Hyper-V atop is easy. • Target • Servers & • Cluster • Replication • mechanism • Storage • mechanism

  8. Constructing Site-Proof Hyper-V: Three Things Replication Mechanism Storage Device(s) Target Servers

  9. Thing 1: A Storage Mechanism Typically, two SANs in two different locations Backup SAN doesn’t necessarily need to be of the same size or speed as the primary SAN • FibreChannel, • iSCSI, • FCoE, • heck JBOD • Replicated • ≠ • Full data • (not always) • Similar model • or • manufacturer • DR • – not for • everything! • Similarity  • proper • replication • DR Environments: • Where Old SANs • Go To Die!

  10. Thing 2: A Replication Mechanism Replication between SANs must occur 2. Asynchronously 1. Synchronously • Changes are made on one node at a time • Subsequent changes on primary SAN must wait for ACK from backup SAN • Changes on backup SAN will eventually be written • Changes queued at primary SAN to be transferred at intervals

  11. Thing 2: A Replication Mechanism 1. Synchronously • Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.

  12. Thing 2: A Replication Mechanism 2. Asynchronously • Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.

  13. Food for Thought Which would you choose? Why? Synchronous Asynchronous • Potential for loss of data during a failure • Leverages smaller-bandwidth connections, more tolerant of latency • No performance impact • Potential to stretch across longer distances • Assures no loss of data • Requires a high-bandwidth and low-latency connection • Write and acknowledgement latencies impact performance • Requires shorter distances between storage devices Your Recovery Point Objective makes this decision…

  14. Thing 2½: Replication Processing Location There are also two locations for replication processing… 1. Storage Layer • Replication processing is handled by the SAN itself • Agents are often installed to virtual hosts or machines to ensure crash consistency • Easier to set up, fewer moving parts. More scalable • Concerns about crash consistency 2. OS / Application Layer • Replication processing is handled by software in the VM OS • This software also operates as the agent • More challenging to set up, more moving parts. More installations to manage/monitor. Scalability and cost are linear • Fewer concerns about crash consistency

  15. Thing 3: Target Servers and a Cluster • Finally are target servers and a cluster in the backup site.

  16. Clustering’s Sordid History • - Microsoft Cluster Service “Wolfpack” • - “As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering” Windows NT 4.0 Windows 2000 • Greater availability, scalability. Still painful • - Added iSCSI storage to traditional FibreChannel • - SCSI Resets still used as method of last resort (painful) Windows 2003 • - Eliminated use of SCSI Resets • - Eliminated full-solution HCL requirement • - Added Cluster Validation Wizard and pre-cluster tests • - Clusters can now span subnets (ta-da!) Windows 2008 • - Improvements to Cluster Validation Wizard and Migration Wizard • - Additional cluster services • - Cluster Shared Volumes (!) and Live Migration (!) Windows 2008 R2

  17. So, What IS a Cluster?

  18. So, What IS a Cluster? Quorum Drive & Storage for Hyper-V VMs

  19. So, What IS a Multi-Site Cluster?

  20. Quorum: Clustering’s Most Confusing Configuration • Ever been to a Kiwanis meeting…? • A cluster “exists” because it has quorum between its members. Quorum is achieved via a voting process • If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained • Multiple quorum models exist • Different clusters – • different rules • Different than • resource failover • Different clubs – • different rules

  21. Four Options for Quorum • Node and Disk Majority • Node Majority • Node and File Share Majority • No Majority: Disk Only

  22. Quorum in Multi-Site Clusters • Node and Disk Majority • Node Majority • Node and File Share Majority • No Majority: Disk Only Microsoft recommends using the Node and File Share Majority model for multi-site clusters • This model provides the best protection for a full-site outage • Full-site outage requires a file share witness in a third geographic location

  23. Quorum in Multi-Site Clusters • Use the Node and File Share Quorum • Prevents entire-site outage from impacting quorum. • Enables creation of multiple clusters if necessary. Third Site for Witness Server

  24. I Need a Third Site? Seriously? Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated… • What happens if you put the quorum’s file share in the primary site? • The secondary site might not automatically come online after a primary site failure • Votes in secondary site < Votes in primary site

  25. I Need a Third Site? Seriously? Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated… • What happens if you put the quorum’s file share in the secondary site? • A failure in the secondary site could cause the primary site to go down. • Votes in secondary site > votes in primary site. This problem gets even weirder as time passes and the number of servers changes in each site

  26. I Need a Third Site? Seriously? Third Site for Witness Server

  27. Multi-Site Cluster Tips/Tricks Manage Preferred Owners & Persistent Mode options • Make sure your servers failover to servers in the samesite first • But also make sure theyhave options on failing overelsewhere

  28. Multi-Site Cluster Tips/Tricks Consider carefully the effects of Failback • Failback is a great solutionfor resetting after a failure • But Failback can be amassive problem-causer as well • Its effects are particularlypronounced in Multi-Site Clusters • Recommendation: Turn it off,(until you’re ready)

  29. More Multi-Site Cluster Tips/Tricks Resist creating clusters that support other services • A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster Use disk “dependencies” as Affinity/Anti-Affinity rules • Hyper-V all by itself doesn’t have an elegant way to affinitize • Setting disk dependencies against each other is a work-around Add Servers in Pairs • Ensures that a server loss won’t cause site split brain • This is less a problem with the File Share Witness configuration

  30. Multi-Site Cluster Tips/Tricks • Segregate traffic!!!

  31. Most Important! Ensure that networking remains available when VMs migrate from primary to backup site • Crossing subnets also means: changing IP address, subnet mask, gateway, etc., at new site • Automatically done by using DHCP and dynamic DNS OR must be manually updated • DNS replication is also a problem. Clients will require time to update their local cache • Consider reducing DNS TTL or clearing client cache • Clustering can span subnets!- This is good, but only if you plan for it…

  32. Multi-Site Clusteringfor Hyper-VDisaster Recovery Greg Shields, MVP, vExpertSenior PartnerConcentrated Technology www.ConcentratedTech.com @ConcentratdGreg

  33. Enjoy and share this material • Feel free to promote this material • Recommend your peers to pass certification • Blog, Tweet and share this material and your experience on Facebook • You’re an Expert? We will be happy to have you as Backup Academy • contributor. Apply here. Web: http://www.backupacademy.com E-mail: feedback@backupacademy.com Twitter: BckpAcademy Facebook: backup.academy

More Related