390 likes | 536 Views
Reliable and Effective Networking in ESX / ESXi. Vsevolod (Seva) Semouchin Technical Account Manager 26 September 2009. Disclaimer. This session may contain product features that are currently under development.
E N D
Reliable and Effective Networking in ESX / ESXi Vsevolod (Seva) Semouchin Technical Account Manager 26 September 2009
Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. “These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.”
Agenda • HA Cluster Architecture • HA Cluster Positioning • HA Cluster Architecture Overview • HA Cluster and vCenter • Primary and Secondary Nodes • Isolation Detection and Restart Mechanisms
HA Cluster is… • … failover Cluster • … N to N Cluster • … uses heartbeat detection mechanism instead of quorum • May be split in two parts • Have longer timeouts • … Works together with other VI components • VMFS against split-brain • DRS and VMotion to failover application from the running node • … independent from the vCenter
AutoStart agent monitor manage AutoStart Architecture Trigger Sensor AutoStartaware Process Resource Group Proxy Process rules Rule Actuator AutoStart unaware Process
VMware HA Architecture vpxd VC Server ESX Host ESX Host VM vpxa VM vpxa VM VM VM VM Vmap VMap AMAgent AMAgent
VMware HA and vCenter vCenter ESX Host vCenter Manages VMs VMwareHA Manages Hosts VMap VMap – VM aware process – is a proxy process which provides communication between Virtual center and VMware HA. VMap provides to HA sensor values, reports to VC state changes and requests VC to perform on virtual machines actions like stop and start VM.
HA and vCenter Data Flow Host Policy: Node Resources VM Policy: NodePolicy, ChangePolicy, AddVmToNode, DeleteVmFromNode VC Server ESX Host vpxa VMap AMAgent vpxd Requests: StartVM, StopVM, NodePolicyRequest, NodeResouceRequest Notification: NodeFailure, NodeIsolation, SlotRequest, PrimariesChanged
Conclusions VMware HA and vCenter are loosely coupled. • When vCenter fails, HA still can restart VMs • The only functionality lost – HA cannot place VMs according to current resource consumption on survived hosts • We can put vCenter in VM and let it to be restarted by the HA, managed by the same vCenter. • This will provide even higher protection level than vCenter in standalone physical computer
Primary Node • Node that holds AutoStart database is the primary node /opt/vmware/aam/config/agent_env.Linux [Vmkernel for ESXi] /opt/vmware/aam/bin/Cli (ftcli on earlier versions) AAM> listnode Node Type State ----------------------- ------------ -------------- ha_node_01 Primary Agent Running ha_node_02 Primary Agent Running ha_node_03 Secondary Agent Running ha_node_04 Primary Agent Running ha_node_05 Primary Agent Running ha_node_06 Secondary Agent Running ha_node_07 Primary Agent Running ha_node_08 Secondary Agent Running AAM>
Primary Node • Up to 5 nodes. Limit set to minimize traffic. Duties are • Trigger and execute rules • Allow new nodes to join cluster • When all primary nodes fails we will lost HA functionality • Node can be promoted to be primary node when • It is one of the first 5 nodes in the cluster • One of primary nodes become isolated (that includes failure!) • One of primary nodes was put in maintenance mode • One of primary nodes was removed from cluster by administrator • Promotion is virtually random (based on Managed Object Ref.)
Conclusions • Watch primary nodes when you • Want to build stretched cluster (two sites) • Want to spread blade servers in two or more enclosures (chassis) • Want to spread rack mount servers across two or more racks • Use following cli commands: • /opt/vmware/aam/bin/Cli # or ftcli • listnode • demotenode • Promotenode • Or just put no more than 4 servers in one site/chassis/rack
Agenda • HA Cluster Architecture • Isolation Detection and Restart Mechanisms • Isolation Event • Cluster Split-Brain Situation • Cluster Networking • VM Shutdown and Restart
HA Node Isolation • HA is a failover cluster. • Unplanned failover is caused by the node failure • HA calls such event “node isolation” • Why “isolation” and not failure • HA detects both node failure and isolation based on the lost heartbeat. • Survived nodes could not distinguish among failure and isolation of the “lost” node. • Isolated node can understand that it is isolated and trigger special actions based on isolation detection event. • Node failure does not require any special actions, different from those triggered by the node isolation event.
host host host agent agent agent VMware HA Split Brain Situation VMware HA Cluster Service Console Network - Heartbeats Isolation detection address – Service Console Network Gateway
Split Brain protection by VMFS Network Network A B 1. VMs are running on host B 2. Host A considers host B as failed and tries to start VMs 3. VMFS Lock prevents VMs from being started twice
Isolated Host Behavior • Since the isolated host locks all VMs running on it, it prevents them from being restarted on other, connected hosts • Power off or shutdown VMs on the isolated host currently the only way to migrate them from the isolated host to connected ones. • Turn off VM on the isolated host could be treated as the cold migration. • We don’t need to shutdown VMs when we are sure, that the client connection is still working. • We could VMotion our VMs when the client network is broken and VMotion network is still alive. • Currently only cold migration option is implemented.
“Isolation response” vs “Restart priority” On Isolated Host On Connected Host Start VMs “Restart priority” High priority Medium priority Do not restart Applicable to Each single VM Whole cluster • Stop VMs • “Isolation response” • Power off • Shutdown • Leave power on • Applicable to • Each single VM • Whole cluster
VM Restart Behavior • The HA Cluster restarts VM depending on its “Restart priority” • When restart fails, the HA cluster will repeat its attempts to restart a VM until: • The VM is shut down or powered off which frees its lock • The formerly isolated host is no longer isolated and reconnects to the vCenter Server. • The formerly isolated host subsequently fails • The VM gets deregistered from the vCenter Server
Special Restart Case – VM Files are not Available • When VM configuration file is not available vCenter can change its state to “disconnected” • Use case – cluster stretched over two sites. One site (both servers and storage arrays) fails. • This is not an HA feature. • When the configuration file becomes available again HA can restart that VM when it detects changes in resource consumption on one of the survived hosts. • Use case – after storage failover you may power on and then off a dummy VM. This will force HA cluster to restart failed over VMs
Restart Process • First High priority VM, then Medium priority VM are being restarted • Assign high priority to your vCenter VM • The decision which VM to start and where to start VM is taken by primary node according to its database. • No primary nodes, no restart • Each time it restarts VM VMware HA request information about node resource consumption from vCenter. • HA sends command to start and stop VM to the local vCenter agent – vpxa • When vpxa fails node cannot fulfill HA commands
HA Catastrophic Outage Needs 1 minute to rebuild the spanning three Turned off for maintenance Service console network designed without single points of failure All nodes are isolated
How to Prevent Such Outage • Use PortFast feature on Cisco switches • http://www.cisco.com/application/pdf/paws/10586/65.pdf • Increase the isolation detection time • das.failuredetectiontime • Use the second console network interface as cluster private network. • Risk – we may have a node which is still considered by HA as connected, when it is disconnected from VM network
Cluster Networking Best Practices ESX Server vCenter S C VMkernel VMotion
Advanced Parameter das.allowVmotionNetworks • Applicable for ESXi only • Value – “true” or “false” • Allows one NIC to be shared by VMotion and Management networks • ESXi do not have service console. For vCenter and HA communication a VMkernel port group should be used. Such group is called “Management” port group. • HA during initialization skips VMkernel port groups with allowed VMotion • This behavior may be overridden by the parameter das.allowVmotionNetworks = true
Advanced Parameter das.allowNetwork[n] • Applicable for both ESX and ESXi • Value – character string – port group name • In fact this parameter disables networks • Usage – • Once this parameter is used only port groups whose names are declared will be allowed for the cluster communication. • The parameter is checked when a new host joins the cluster
Example VMotion VMotion HA Mgmt 1 Mgmt 1 HA Mgmt 2 HA Mgmt 2 VMotion VMotion HA Mgmt 3 Mgmt 3 We want to use for HA communications port groups “Mgmt 2” and “Mgmt 1”, but by default HA picks up only “Mgmt 2” Effect of setting das.allowVmotionNetworks to “true”
Example VMotion HA Mgmt 1 HA Mgmt 2 VMotion Mgmt 3
VMotion Network Compatibility HA 10.0.20.1 HA 10.0.30.2 SrvC 1 SrvC1 HA 10.0.10.1 HA 10.0.10.2 SrvC 2 SrvC2 Network compatibility check was introduced in ESX/ESXi 3.5 U2. The reason – on incompatible networks IP timeout instead of heartbeats should be used to detect the node isolation. This takes too much time. Network compatibility check may be overwritten by the following advanced parameter (introduced since ESX/ESXi 3.5 U3): das.bypassNetCompatCheck = “true”
DRS Interaction • You can use DRS antiaffinity rules to increase the availability of thew application on infrastructure level • Use Case • Some critical VMs or Loadd balancing farm • Solution – use antiaffinity rules to force those VMs to use different hosts Antiaffinity
Cluster Scalability When you need restart your application in…
Conclusion VMware HA Cluster is a powerful tool to implement the high availability on enterprise level • Use it to implement the high availability of your vCenter server • Design cluster networking properly • Avoid single points of failure • Think of network compatibility • Create proper restart policy for your VMs • Leverage DRS to increase the application availability on infrastructure level
Thank you for coming. Rate your session and watch for the highest scores!