540 likes | 665 Views
VMware vCenter Server Fault Tolerance. John Browne/Adly Taibi/Cormac Hogan Product Support Engineering Rev E. VMware Confidential. Module 2 Lessons. Lesson 1 – vCenter Server High Availability Lesson 2 – vCenter Server Distributed Resource Scheduler Lesson 3 – Fault Tolerance
E N D
VMware vCenter Server Fault Tolerance John Browne/Adly Taibi/Cormac Hogan Product Support Engineering Rev E. VMware Confidential
Module 2 Lessons • Lesson 1 – vCenter Server High Availability • Lesson 2 – vCenter Server Distributed Resource Scheduler • Lesson 3 – Fault Tolerance • Lesson 4 – Enhanced vMotion Compatibility • Lesson 5 – DPM - IPMI • Lesson 6 – vApps • Lesson 7 – Host Profiles • Lesson 8 – Reliability, Availability, Serviceability ( RAS ) • Lesson 9 – Web Access • Lesson 10 – vCenter Update Manager • Lesson 11 – Guided Consolidation • Lesson 12 – Health Status VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerance • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
Understanding VMware Fault Tolerance • The VMware Fault Tolerance (FT) feature creates a virtual machine configuration that can provide continuous availability. • VMware Fault Tolerance (FT) is built on the ESX/ESXi 4.0 host platform. FT is provided using the Record/Replay functionality implemented in the VM monitor. • VMware FT works by creating an identical copy of a virtual machine. • One copy of the virtual machine, called the primary, is in the active state, receiving requests, serving information, and running applications. • Another copy, called the secondary, receives the same input that is received by the primary. VI4 - Mod 2-3 - Slide
Understanding VMware Fault Tolerance (ctd) VI4 - Mod 2-3 - Slide
Understanding VMware Fault Tolerance (ctd) • VMware FT provides a higher level of business continuity than HA. • In the case of FT, the secondary immediately comes on-line and all (or almost all) information about the state of the virtual machine is preserved. • The state of the secondary machine is dependant on the latency & lag between the primary and secondary VMs. • VMware FT does not require a Virtual Machine restart & applications and data stored in memory do not need to be re-entered or reloaded. VI4 - Mod 2-3 - Slide
Virtual Machine Record & Replay Virtualization Layer Virtualization Layer Application Application Operating System Operating System • Logging causes of non-determinism • Input (network, user), asynchronous I/O (disk, devices), CPU Timer interrupts • Deterministic delivery of events previously logged • Result = repeatable VM execution RECORD REPLAY VI4 - Mod 2-3 - Slide
Virtual Machine Record & Replay (ctd) • For a given primary VM, FT runs a secondary VM on a different host. • Sharing virtual disks with primary. • Secondary VM kept in “virtual lockstep” via logging info sent over private network connection. • Only primary VM sends and receives network packets, secondary is “Passive”. • If primary host fails, secondary VM takes over with no interruption to applications. VI4 - Mod 2-3 - Slide
primary backup vmkernel vmkernel FT in the VMkernel • The FT vmkernel module is called vmklogger. • Log entries are put in the log buffer, which is flushed/filled asynchronously. • Log entries are sent/received through socket on VMkernel NIC. • There should be a dedicated VMkernel network for logging which has FT Logging enabled. VI4 - Mod 2-3 - Slide
Determining Node Failure FT does frequent heartbeat’ing through multiple NICs to determine when primary/backup hosts are down. Backup “goes live” and becomes new primary if it declares current primary dead We must have a method to distinguish a crashed host from a network failure (“split-brain”). Our method is to use an atomic operation (rename) on shared VMFS. Whenever primary/backup believes other host is down, it renames common file. Winner of rename “race” survives, loser of rename “race” commits suicide. VI4 - Mod 2-3 - Slide
Record/Replay and FT Requirements: ESX/HW CPUs: Limited processors (AMD Barcelona+, Intel Penryn+), processors must be the same family (i.e. no mix/match) Hardware Virtualization must be enabled in the BIOS Hosts must be in an HA-enabled cluster Storage: shared storage (FC, iSCSI, or NAS) Network: minimum of 3 NICs for various types of traffic (ESX Management/VMotion, VM traffic, FT Logging) GigE required for VMotion and FT Logging Minimized single points of failures in the environment – i.e. NIC teaming, multiple network switches, storage multipathing Primary and secondary hosts must be running the same build of ESX VI4 - Mod 2-3 - Slide
FT VM’s run only in an HA cluster Mission-critical VMs are protected by FT and HA, remaining VM’s protected by HA When a host fails: FT secondary takes over New FT secondary is started by HA HA-only VM’s are restarted VMware Fault Tolerance and HA Work Together X VMware FT VMware FT VMware FT X VMware HA X Resource Pool VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerance • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance • For VMware FT to perform as expected, it must run in an environment that meets specific requirements. • The primary and secondary fault tolerant virtual machines must be in a VMware HA cluster. • Primary and secondary ESX/ESXi hosts should be the same CPU model family. • Primary and secondary virtual machines must not run on the same host. FT will automatically place the secondary VM on a different host. VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • Storage • Virtual machine files must be stored on shared storage. • Shared storage solutions include NFS, FC, and iSCSI. • For virtual disks on VMFS-3, the virtual disks must be thick, meaning they cannot be "thin" or sparsely allocated. • Turning on VMware FT will automatically convert the VM to thick-eager zeroed disks. • Virtual Raw Disk Mapping (RDM) is supported. Physical RDM is not supported. VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • Networking • Multiple gigabit Network Interface Cards (NICs) are required. • A minimum of two VMKernel Gigabit NICs dedicated to VMware FT Logging and VMotion. • The FT Logging interface is used for logging events from the primary virtual machine to the secondary FT virtual machines. • For best performance, use 10Gbit NIC rather than 1Gbit NIC, and enable the use of jumbo frames. VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • Processor • SMP Virtual Machines are not supported. • Virtual Machines must be of the same CPU model family. Supported processors include the following: • Intel Core 2, also known as Merom • Intel 45nm Core 2, also known as Penryn. • Intel Next Generation, also known as Nehalem. • AMD 2nd Generation Opteron, also known as Rev E/F common feature set. • AMD 3rd Generation Opteron, also known as Greyhound. VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • Host BIOS • VMware FT requires that Hardware Virtualization (HV) be turned on in the BIOS. The process for enabling HV varies among BIOS’es. • If HV is not enabled, attempts to power on a primary copy of a fault tolerant virtual machine produces the following error message: • "Fault tolerance requires that Record/Replay is enabled for the virtual machine. Module Statelogger power on failed." VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • If HV is enabled for the ESX/ESXi host that is hosting a primary copy of a fault tolerant virtual machine, but not on any other hosts in the cluster, the primary can be successfully powered on. • After the primary is powered, VMware FT automatically attempts to start the fault tolerant secondary. This fails after a brief delay and produces the following error message: • "Secondary virtual machine could not be powered on as there are no compatible hosts that can accommodate it." • The primary remains powered on in live mode, but fault tolerance is not established. VI4 - Mod 2-3 - Slide
Prerequisites for VMware Fault Tolerance (ctd) • Turn off power-management (also known as power-capping) in the BIOS. • If power management is left enabled, the secondary hosts may enter lower performance, power-saving modes. • Such modes can leave the secondary virtual machine with insufficient CPU resources, potentially making it impossible for the secondary to complete all tasks completed on a primary in a timely fashion. • Turn off hyperthreading in the BIOS. • If hyperthreading is left enabled and the secondary virtual machine is sharing a CPU with another demanding virtual machine, the secondary virtual machine may run too slowly to complete all tasks completed on the primary in a timely fashion. VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerant VM’s • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
Setting Up Fault Tolerance • To enable Fault Tolerance, connect the vSphere client to the vCenter Server using an account with cluster administrator permissions. • In the Hosts & Clusters view, select a Virtual Machine. • Next, right mouse click > Fault Tolerance > Turn Fault Tolerance On • If the Virtual Machines is stored on a thinly provisioned or eagerly scrubbed disk(s), those disk files must be converted to Thick-EagerZeroed before FT can be enabled. • When FT is enabled, a message appears informing users of this requirement and of the fact that the conversion will be completed. • The specified virtual machine is marked as a primary and a secondary is established on another host. FT is now enabled. VI4 - Mod 2-3 - Slide
Setting Up Fault Tolerance (ctd) VI4 - Mod 2-3 - Slide
Setting Up Fault Tolerance (ctd) VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerant VM’s • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
Fault Tolerant VMs have an additional Fault Tolerance pane on their summary tab which provides information about the Fault Tolerance setup and performance. Fault Tolerance Status - Indicates the status of fault tolerance - Protected or Not Protected/Disabled. Viewing Information about Fault Tolerant VMs VI4 - Mod 2-3 - Slide
Secondary Location - Displays the ESX/ESXi host on which the secondary virtual machine is hosted. Total Secondary CPU - Indicates all secondary CPU usage, displayed in MHz. Total Secondary Memory - Indicates all secondary memory usage, displayed in MB. Secondary VM Lag Time shows the current delay between the primary and secondary VM. Log Bandwidth shows the consumed bandwidth on the link for Record/Replay operations between the primary and secondary VM. This value is based on the FT operations only, and is not the bandwidth usage on the wire (i.e with. TCP/IP/Ethernet headers). Viewing Information about Fault Tolerant VMs (ctd) VI4 - Mod 2-3 - Slide
Before VM is FT enabled After VM is FT Enabled FT Virtual Machine files VI4 - Mod 2-3 - Slide
Maps View of an FT VM VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerant VM’s • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
VMware FT Restrictions • Many VMware Infrastructure features and third-party products are supported for use with VMware FT, but the following features are not: • Microsoft Cluster Services (MSCS): MSCS does its own failover and management. As a result, conflicts may arise with coexistence of VMware FT and MSCS solutions. • Nested Page Tables/Extended Page Tables (NPT/EPT): A restriction of the record/replay implementation. This restriction does not affect the user experience. Record/replay for virtual machines automatically disables NPT/EPT, even though other virtual machines on the same host can continue to use these features. • Paravirtualization: A restriction of the record/replay implementation. Record/replay does not work with paravirtualized guests. • Hot-plugging devices: A restriction of the record/replay implementation. Users cannot hot add and remove devices. • Automatic DRS recommendation application: For this release, an FT virtual machine can not be used with DRS, though manual VMotion is allowed. VI4 - Mod 2-3 - Slide
Features not supported with VMware FT • Symmetric multiprocessor (SMP) virtual machines. • Storage VMotion. • NPIV – N-Port ID Virtualization. • NIC passthrough. • Devices which do not have Record/Replay support such as USB and sound. • Some network interfaces for legacy network hardware such as vlance. • While some legacy drivers are not supported, VMware FT does revert to the supported vmxnet2 driver, thereby handling cases where vlance would otherwise be required. • Virtual Machine snapshots. VI4 - Mod 2-3 - Slide
Fault Tolerance Best Practices • Ratio of Fault Tolerant VMs to ESX/ESXi hosts • Maintaining consistency between primary and secondary fault tolerant virtual machines makes significant use of disk and network resources. • You should have no more than four to eight fault tolerant virtual machines, primaries or secondaries on any single host. • The number of fault tolerant virtual machines that you can safely run on each host cannot be stated precisely because the number is based on the ESX/ESXi host and virtual machine size and workload factors, all of which can vary widely. VI4 - Mod 2-3 - Slide
Fault Tolerance Use Cases • Several typical situations that can benefit from the use of VMware FT. For example: • Any application that needs to be available at all times. This especially applies to applications that have long-lasting client connections that users want to maintain during hardware failure. • Custom applications that have no other way of doing clustering. • Cases where high availability might be provided through MSCS, but MSCS is too complicated to configure and maintain. VI4 - Mod 2-3 - Slide
Module 2-3 Lessons • Lesson 1 – Understanding Fault Tolerance • Lesson 2 – Prerequisites for Fault Tolerant VM’s • Lesson 3 – Setting up Fault Tolerance • Lesson 4 – Viewing information about Fault Tolerant VM’s • Lesson 5 – Fault Tolerant Guidelines • Lesson 6 – Troubleshooting Fault Tolerance VI4 - Mod 2-3 - Slide
Primary vmware.log FT Startup Messages • Mar 04 15:40:41.556: vmx| MigrateStateUpdate: Transitioning from state 0 to 1.Mar 04 15:40:41.557: vmx| Migrating to become primaryMar 04 15:40:41.557: vmx| StateLogger_MigrateStart: VMotion srcIp 192.168.0.65, dstIp 192.168.0.55Mar 04 15:40:41.557: vmx| StateLogger_MigrateStart: Logging srcIp 172.16.0.65, dstIp 172.16.0.55... • Mar 04 15:40:49.538: vmx| VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_202/op/=startMar 04 15:40:49.539: vmx| VmxMigrateGetStartParam: mid=464539447b562 dstwid=4953Mar 04 15:40:49.539: vmx| Received migrate 'start' request for mig id 1236210039633250, dest world id 4953.Mar 04 15:40:49.541: vmx| MigrateStateUpdate: Transitioning from state 1 to 2.Mar 04 15:40:49.817: vcpu-0| MigrateStateUpdate: Transitioning from state 2 to 3.Mar 04 15:40:49.818: vcpu-0| Migrate: Preparing to suspend.Mar 04 15:40:49.819: vcpu-0| Migrating a secondary VMMar 04 15:40:49.819: vcpu-0| CPT current = 0, requesting 1Mar 04 15:40:49.819: vcpu-0| Migrate: VM stun started, waiting 8 seconds for go/no-go message.... VI4 - Mod 2-3 - Slide
Primary vmware.log FT Startup Messages (ctd) • Mar 04 15:40:49.852: vmx| Migrate_Open: Migrating to <192.168.0.55> with migration id 1236210039633250Mar 04 15:40:49.852: vmx| Checkpointed in VMware ESX, 4.0.0 build-151628, build-151628, Linux HostMar 04 15:40:49.853: vmx| BusMemSample: checkpoint 3 initPercent 75 touched 98304Mar 04 15:40:49.854: vmx| FT saving on primary to create new backupMar 04 15:40:49.889: vmx| Connection accepted, ft id 2487727458.Mar 04 15:40:49.892: vmx| STATE LOGGING ENABLED (interponly 0 interpbt 0)Mar 04 15:40:49.893: vmx| LOG data... • Mar 04 15:40:50.275: vmx| Migrate: VM successfully stunned.Mar 04 15:40:50.276: vmx| MigrateStateUpdate: Transitioning from state 3 to 4.Mar 04 15:40:50.890: vmx| MigrateSetStateFinished: type=1 new state=5Mar 04 15:40:50.890: vmx| MigrateStateUpdate: Transitioning from state 4 to 5.Mar 04 15:40:50.891: vmx| StateLogger_MigrateSucceeded: Backup connected • Mar 04 15:40:50.891: vmx| Migrate: Attempting to continue running on the source.Mar 04 15:40:50.893: vmx| CPT current = 3, requesting 6 • ... • Mar 04 15:40:50.915: vmx| Continue sync while logging or replaying 8428Mar 04 15:40:50.924: vmx| Migrate: cleaning up migration state.Mar 04 15:40:50.924: vmx| MigrateStateUpdate: Transitioning from state 5 to 0. VI4 - Mod 2-3 - Slide
Migration Transition States - Primary • Base state. No migration currently in progress. • MIGRATE_VMX_NONE - state 0 • VMX has received a MIGRATE_TO message. Waiting for the start message along with the world ID of the destination. • MIGRATE_TO_VMX_READY – state 1 • VMX has received a MIGRATE_START message. Precopying data to destination. • MIGRATE_TO_VMX_PRECOPY – state 2 • Precopy done. Saving checkpoint. • MIGRATE_TO_VMX_CHECKPT – state 3 • Done saving checkpoint. Waiting for acknowledgement from destination that the VMX started. Until the acknowledgement is received, the migration may still fail back to the source. • MIGRATE_TO_VMX_WAIT_HANDSHAKE – state 4 • Migration succeeded or failed. On success, VMX process needs to power down and cleanup. On failure, VM will continue running and be ready for the next migration operation after this state passes. • MIGRATE_TO_VMX_FINISHED – state 5 VI4 - Mod 2-3 - Slide
Migration Transition States - Secondary • Base state. No migration currently in progress. • MIGRATE_VMX_NONE - state 0 • VMX has received a MIGRATE_FROM message. Getting ready to receive VM. • MIGRATE_FROM_VMX_INIT – state 7 • VMX is ready and waiting for source to send VM data. • MIGRATE_FROM_VMX_WAITING – state 8 • Both memory and checkpoint data is being copied to destination. • MIGRATE_FROM_VMX_PRECOPY – state 9 • Data was precopied. Restoring checkpoint. • MIGRATE_FROM_VMX_CHECKPT – state 10 • Migration succeeded or failed. On success, VMX process runs migrated VM. After state passes, VMX is ready for next migration operation. On failure, VM will power down and cleanup. • MIGRATE_FROM_VMX_FINISHED – state 11 VI4 - Mod 2-3 - Slide
FT Troubleshooting – Primary vmkernel logs • Immediately following the FT migration you will see messages like these on the ESX. You will want to note the migration ID & the statelogger ID in the case where there are many FT VMs: • Primary: Mar 4 10:51:35 prme-stft053 vmkernel: 0:16:24:12.912 cpu2:4281)VMotion: 2582: 1236192688557132 S: Stopping pre-copy: only 11178 pages were modified, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~122.213 MB/s) Mar 4 10:51:35 prme-stft053 vmkernel: 0:16:24:12.917 cpu3:4280)VSCSI: 5850: handle 8193(vscsi0:0):Destroying Device for world 4281 (pendCom 0) Mar 4 10:51:36 prme-stft053 vmkernel: 0:16:24:13.663 cpu7:4230)VMKStateLogger: 6856: 2316520524: accepting connection from secondary at 10.0.57.10 VI4 - Mod 2-3 - Slide
FT Troubleshooting – Secondary vmkernel logs • Secondary: Mar 4 10:51:34 prme-stft057 vmkernel: 0:19:53:47.483 cpu2:4286)VMotion: 1805: 1236192688557132 D: Set ip address '192.168.57.10' worldlet affinity to recv World ID 4289 Mar 4 10:51:34 prme-stft057 vmkernel: 0:19:53:47.644 cpu7:4228)MigrateNet: vm 4228: 1096: Accepted connection from <192.168.53.10> Mar 4 10:51:34 prme-stft057 vmkernel: 0:19:53:47.644 cpu7:4228)MigrateNet: vm 4228: 1110: dataSocket 0x4100b6092e60 send buffer size is 263536 Mar 4 10:51:35 prme-stft057 vmkernel: 0:19:53:49.427 cpu3:4289)VMotionRecv: 226: 1236192688557132 D: Estimated network bandwidth 100.872 MB/s during pre-copy Mar 4 10:51:36 prme-stft057 vmkernel: 0:19:53:50.055 cpu7:4286)VSCSI: 3469: handle 8193(vscsi0:0):Creating Virtual Device for world 4287 (FSS handle 163860) Mar 4 10:51:36 prme-stft057 vmkernel: 0:19:53:50.176 cpu7:4286)VMKStateLogger: 1949: 2316520524: Connected to primary VI4 - Mod 2-3 - Slide
FT Troubleshooting – vmware.log • The FT pair ID (logged from the StateLogger vmkernel module to identify the FT pair) is also found in the vmware.log file. • This is an example of a secondary who's primary died: Mar 03 20:03:56.457: vcpu-0| StateLoggerSetEndOfLog: BCnt: 344876494570 fSz: 0 bufPos 14775374 ...Mar 03 20:03:56.464: vmx| Preparing to go live ...Mar 03 20:03:56.503: vmx| Done going live Mar 03 20:03:56.503: vmx| Failover initiated via vmdb Mar 03 20:03:56.504: vmx| Gone live because of Lost connection to primary. Mar 03 20:03:56.506: vmx| Unstunning after golive ... Mar 03 20:04:08.199: vmx| FT saving on primary to create new backup Mar 03 20:04:08.203: vmx| Connection accepted, ft id 607078005. VI4 - Mod 2-3 - Slide
FT Troubleshooting – Split Brain • For support's purposes, the vmkernel log files will display messages similar to the following on the host running the VM that lost the race for the generation file (and thus did not golive): Mar 4 10:52:45 prme-stft057 vmkernel: 0:19:54:58.861 cpu2:4291)VMKStateLogger: 7823: Rename of .ft-generation2 to .ft-generation3 failed: Not found Mar 4 10:52:45 prme-stft057 vmkernel: 0:19:54:58.861 cpu2:4291)VMKStateLogger: 2792: 2316520524: Can *NOT* golive • On the host running the VM that did win the race and successfully renamed the file (and did golive) you will see a corresponding message: Mar 4 10:52:45 prme-stft053 vmkernel: 0:16:25:22.150 cpu6:4283)VMKStateLogger: 2792: 2316520524: Can golive • The other thing you'll want to note is the statelogger ID if there are multiple FT enabled VMs. VI4 - Mod 2-3 - Slide
VMware SiteSurvey Tool • We have created a new utility which analyzes a cluster of ESX hosts and tells you whether the configuration is suitable for FT. • This includes checking for FT-compatible processors, shared storage, BIOS settings, etc. • The utility is called VMware SiteSurvey and a Beta copy is available in the "Documents" tab. • To use it, download the VMware SiteSurvey executable from that page and run it, which will install the utility on your local Windows machine. VI4 - Mod 2-3 - Slide
VMware SiteSurvey Tool (ctd) VI4 - Mod 2-3 - Slide
Troubleshooting Fault Tolerance • When attempting to power on a virtual machine with VMware FT enabled, an error message may appear in a pop-up dialog box. • "Fault tolerance requires that Record/Replay is enabled for the virtual machine. Module Statelogger power on failed.“ • What is a possible root cause? VI4 - Mod 2-3 - Slide
Troubleshooting Fault Tolerance (ctd) • After powering on a virtual machine with VMware FT enabled, an error message may appear in the Recent Task Pane. • "Secondary virtual machine could not be powered on as there are no compatible hosts that can accommodate it.“ • What is a possible root cause? VI4 - Mod 2-3 - Slide
Troubleshooting Fault Tolerance (ctd) • When selecting a VM to enable Fault Tolerance, you find that the ‘Turn on Fault Tolerance’ option is greyed out. What are the possible causes? • The host on which the Virtual Machine resides is not part of a VMware HA Cluster. • The host on which the Virtual Machine resides does not have Hardware Virtualization turned on in the BIOS for the CPUs. • The Virtual Machine does not support VMware Fault Tolerance. Update the virtual machine to a more recent version. • The Virtual Machine has snapshots. Delete any snapshots. VI4 - Mod 2-3 - Slide
Lesson 2-3 Summary • vSphere 4.0 introduce a new concept called Fault Tolerance. • This enhances the VM availability that we had with VMware HA in so far as there is no downtime on the VM when a hardware failure occurs on the ESX host. • However in this initial release, there are a number of restrictions placed on the VM configuration if it wishes to use FT. VI4 - Mod 2-3 - Slide
Lesson 2-3 - Lab 1 • Lab 1 involves creating Fault Tolerant VM’s • Create a Fault Tolerant VM • Watch a Fault Tolerant VM failover to another host • Fault Tolerant VM settings VI4 - Mod 2-3 - Slide