400 likes | 528 Views
TA05 VMware Infrastructure 3 Networking – Advanced Configuration and Troubleshooting. Jean Lubatti Product Support Engineer, VMware. Housekeeping. Please turn off your mobile phones, blackberries and laptops
E N D
TA05 VMwareInfrastructure 3 Networking –Advanced Configurationand Troubleshooting Jean Lubatti Product Support Engineer, VMware
Housekeeping • Please turn off your mobile phones, blackberries and laptops • Your feedback is valued: please fill in the session evaluation form (specific to that session) & hand it to the room monitor / the materials pickup area at registration • Each delegate to return their completed event evaluation form to the materials pickup area will be eligible for a free evaluation copy of VMware’s ESX 3i • Please leave the room between sessions, even if your next session is in the same room as you will need to be rescanned
Agenda • Components of the Networking Stack • Virtual NIC overview and troubleshooting • VSwitch overview • PortGroups overview • VLANs • VST, EST and VGT • The native VLAN • NIC Teaming • Port Id based, IP hash based • Reverse teaming • Beaconing and shotgun • Rolling Failover • VSwitch advanced options • Security settings • Notify switch • VMKernel Network Traffic • Command Line Utilities • Advanced Troubleshooting summary • Q & A
Virtual NICs overview • A virtual NIC is an emulated layer 2 device used to connect to the vSwitch • Each virtual NIC has a MAC address of its own and does address based filtering • No need for implementation of a PHY (Physical Layer) • No auto-negotiation • Speed/Duplex/Link are irrelevant • Ignore speed/duplex reported in the guest OS • Actual speed of operation depends on the CPU cycles available and speed of the uplinks • Different types of Virtual NICs • Virtual adapter for VMs • VLance, vmxnet, enhanced vmxnet (for esx 3.5.0) and E1000 • Vswif for Service console • Vmknic for VMKernel
Troubleshooting Virtual NICs • Check the VM configuration • Make sure the guest OS recognizes the virtual adapter and loads the appropriate driver • Use utilities like lspci, lsmod, Device Manager etc • Check the guest OS and VM logs for any obvious errors • MAC address conflict can occur only if • You manually set conflicting MAC addresses • After manually copying VMs, you choose not to regenerate a new UUID when prompted • Unplugging / replugging a vNIC changes the virtual port ID!
Troubleshooting Virtual NICs • It is possible to manually turn off advanced vNIC features • This may help troubleshooting • But do not jump to conclusions!
VSwitch overview • Software implementation of an ethernet switch • How is it similar to a physical switch? • Does MAC address based forwarding • Provides standard VLAN segmentation • Configurable • Uplink aggregation • How is it different? • Does not need to learn MAC addresses • It knows the MAC addresses of the virtual NICs connecting to it • Packets not destined for a VM are forwarded outside • Single tier topology • No need to participate in Spanning Tree Protocol • Can do rate limiting
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch VSwitch PhysicalNICs 1000 Mbps 100 Mbps 1000 Mbps PhysicalSwitches VSwitch overview: Spanning Tree Protocol • STP is a link management protocol that prevents network loops • Loops are not possible within the same vSwitch • No packet entering a vSwitch will ever be allowed to go back to the physical network • Two vSwitches cannot be connected • Single level topology Loops are not possible inside ESX without a layer 2 bridging VM
PortGroups overview • PortGroups are configuration templates for ports on the vSwitch • Efficient way to specify the type of network connectivity needed by a VM • PortGroups specify • VLAN Configuration • Teaming policy (can override vSwitch setting) • Layer 2 security policies (can override vSwitch setting) • Traffic shaping parameters (can override vSwitch setting) • PortGroups are not VLANs • PortGroups do not segment the vSwitch into separate broadcast domains unless they have different VLAN IDs
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch VSwitch Physical Switch VSwitch VLAN 104 VLAN 105 VLAN 106 VLANs: Virtual Switch Tagging • Most commonly deployed configuration and recommended setup • The vSwitch does the tagging/untagging • Physical switch port should be a trunk port • Number of VLANs per VM is limited to the number of vNICs vSwitch tags and strips the frames 802.1Q tagged frames on the physical NIC
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch VSwitch PhysicalNICs 100 Mbps 1000 Mbps PhysicalSwitch VLAN 105 VLAN 106 VLANs: External Switch Tagging • VLAN tagging and stripping is done by the physical switch • No configuration required on the ESX Server • The vSwitch does not tag or strip the frames • Number of VLANs supported is limited to the number of physical NICs on the ESX server vSwitch receives untagged frames Physical switch is responsible for the tagging and stripping Rest of the network
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch VSwitch Physical Switch VSwitch VLAN 4095 VLANs: Virtual Guest Tagging • PortGroup VLAN ID is set to 4095 • Tagging and stripping of VLAN IDs happens in the guest VM • 802.1q software/driver in the VM • In VGT mode guest can send/receive any VLAN tagged frame • Number of VLANs per guest is not limited to the number of vNICs • VMWare does not ship a 802.1q vmxnet driver • Windows: Only with E1000 • Linux: dot1q module VLAN tagging and stripping software/driver needed in the VM vSwitch does not tag or strip the frames
VLANs: Native VLAN The vSwitch won’t deliver untagged frames to the VM unless the portgroup has no VLAN specified. • Using the native VLAN is fully supported on ESX • However, it is important to remember which part of the network infrastructure is tagging and untagging the frames! • Default native VLAN is often VLAN 1 • If you have to use default native VLAN on a VST configuration • Use a PortGroup with no vlan id set VM with a VLAN ID 1 Virtual Switch VLAN 1 Frames not tagged Physical Switch with Native VLAN ID 1 Physical Machine with VLAN ID 1
VLANs: Troubleshooting • Remember “who” should tag. The ESX or the physical switch? • It cannot be both! • Trunk encapsulation should be set to 802.1q • No ISL, LANE etc. • Trunking should be static and unconditional • No Dynamic Trunking Protocol (DTP) • Manually specify all the VLANs to be trunked • No VLAN Trunking Protocol (VTP) • Disallow unnecessary VLAN IDs on the physical switch port • ESX won’t spend time processing unnecessary broadcasts The physical switch sees multiple VLAN ids on the same port Configure the switch to expect frames with VLAN Id 105 and 106 on this port The physical switch port needs to be configured as a trunk port
Active Standby A B C D E F NIC Teaming • Allows for multiple active NICs to be used in a teaming configuration • User can choose the policy for distribution of traffic across the NICs • Standby uplinks replace active uplinks when active uplinks fail to meet specified criteria VM ports 1 2 3 4 5 6 7 8 9 10 11 12 13 14 uplink ports
NIC Teaming: Failure criteria • Use vimsh • hostsvc/net/portgroup_set • Conservative defaults: • Speed > 10Mb • Duplex = full • Beacons received • Other possible settings • Percentage of errors
VM ports 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Active Standby A B C D E F Standby Active Standby A B C D E F A B C D E F uplink ports Standby Active A B C D E F NIC Teaming: PortGroup based Teaming Configuration • Teaming policy attributes can vary by PortGroups on a single vSwitch • Four load balancing policies • Originating Port ID based • Source MAC address based • IP hash based • Explicit failover order
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitch NIC Teaming: Port Id (or MAC Hash) • Both policies are relying on a given VM MAC address always using the same outgoing physical NIC • Port-ID is the default and is recommended over MAC hash • Load balancing on a per vNIC basis • Both allows teaming across physical switches in the same broadcast domain • Requires the physical switch not to be aware of the teaming • The physical switch learns the MAC/switch port association • Inbound traffic is received on the same NIC • Power operations or connect operations on a vNIC will increment the port ID!
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitch NIC Teaming: IP hash • Uplink chosen based on Source andDestination IP Address • Load balancing on a per connection basis • Requires physical switch to be aware of the teaming • Does not allow teaming across physical switches • Inbound traffic can be received on any one of the uplinks The switch sees VM2’s MAC address on all three ports Need to enable Link Aggregation on the physical switch ports
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitch NIC teaming: Reverse Teaming • VMs can receive duplicate broadcast/multicast packets • Reverse teaming eliminates this • Receive frames only from an uplink port we would have used to transmit • Optimizes local traffic on the vSwitch • Drop external frames with local source MAC addresses If using port id or MAC hash based teaming don’t enable link aggregation on the physical switch
NIC Teaming: Link redundancy • Failure detection • Link status • Beacon Probing • Rolling Failover • Fail-back if set to `No`
NIC Teaming: Beacon Probing • Beacon probing attempts to detect failures which don’t result in a link state failure for the NIC • Broadcast frames sent from each NIC in the team should be seen by other NICs in the team (no IP hash!) ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitches Coreswitch / upstream infrastructure
? NIC Teaming: Beacon Probing • Beacon probing attempts to detect failures which don’t result in a link state failure for the NIC • Broadcast frames sent from each NIC in the team should be seen by other NICs in the team (no IP hash!) • NICs not receiving beacons no longer have minimum criteria and are discarded ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitches Coreswitch / upstream infrastructure
NIC Teaming: Beacon Probing and “shotgun” • Beacon probing attempts to detect failures which don’t result in a link state failure for the NIC • Broadcast frames sent from each NIC in the team should be seen by other NICs in the team (no IP hash!) • NICs not receiving beacons no longer have minimum criteria and are discarded ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch ? PhysicalNICs PhysicalSwitches • If all NICs are discarded, all NICs will be used! • If all NICs are discarded, all NICs will be used! Coreswitch / upstream infrastructure
NIC Teaming: Rolling failover (3.0.X) and Failback (3.5.0) • For it to have any effect, rolling failover requires at least one standby NIC • Does not make sense with IP hash teaming • Called differently in 3.0.X and 3.5.X • Example case scenario: • Service Console PortGroup • HA • VMKernel PortGroup • iSCSI/NAS • Use link state tracking as an alternative Switch comes back Switch goes down New standby NIC Active NIC Isolated! But STP still blocks the uplink! New Active NIC Standby NIC
NIC Teaming: Troubleshooting • The switch ports should have consistent VLAN configuration • Multi-switch configurations • Make sure the NICs are in the same broadcast domain • Do not use IP hash based teaming policy across multiple physical switches • Link Aggregation needs to be enabled on the switch ports for IP hash based teaming • Configure physical switch LA to be static and unconditional • No support for PAgP or LACP negotiation
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch Onboard Intel Broadcom PCI card PhysicalNICs Rest of the network Rest of the network NIC Teaming: Tips • Use port-id based NIC teaming in a multi-switch configuration • Use different types of NICs in a team. E.g. • Intel and Broadcom • Onboard and PCI card • For faster failovers • Disable Link Auto-negotiation • Follow STP recommendations • Use standby adapters and rolling failover when availability is an absolute must • Beaconing • Upgrade to 3.0.2 • Use Link State Tracking as an alternative • Not needed on fat tree topology PhysicalSwitch
Promiscuous Mode If allowed, guest receives all frames on the vSwitch Some applications need promiscuous mode Network sniffers Intrusion detection systems MAC Address Change If allowed, malicious guests can spoof MAC addresses vSwitch advanced options: Security settings • Forged Transmits • If allowed, malicious guests can spoof MAC addresses or cause MAC Flooding • Security settings should reflect application requirements • Some applications might need to forge or change MAC addresses • E.g.: Microsoft NLB in unicast mode works by forging MAC addresses.
ESX Server VirtualMachines VirtualNICs VMKernelNIC VSwitch PhysicalNICs PhysicalSwitch Vswitch advanced option: Notify Switch • Client MAC address is notified to the switch via RARP packet • Allows the physical switch to learn the MAC address of the client immediately • Why RARP? • L2 broadcast reaches all switches • L3 information not required • Switch notified whenever • New client comes into existence • MAC address changes • Teaming status changes • Settings should reflect application requirements RARP PACKET The switch learns the MAC address and updates its tables
VSwitch VSwitch VSwitch Vmkernel Network Traffic ESX Server • VMKernel TCP/IP Stack routing table determines packet flow • Put IP Storage and VMotion on separate subnets for isolation • Else traffic will go through the same vmknic: No Isolation • If multiple vmknics in a subnet are connected to the same vSwitch • Outgoing traffic is seen only on one vmknic • Only limited load balancing based on IP hash • VLAN segmentation won’t help isolate outgoing traffic between the vmknics iSCSI VMotion NFS VMKernel TCP/IP Stack Vmkernel TCP/IP Routing Table VMKernel NICs vmknics PhysicalNICs
Vmkernel Traffic: Troubleshooting • cat /proc/vmware/net/tcpip/ifconfig • Use vmkping • Ping uses Service Console TCP/IP stack • Vmkping uses VMKernel TCP/IP stack
Command Line Utilities • esxcfg-vswitch • esxcfg-nics • esxcfg-vswif • esxcfg-vmknic
Command Line Utilities: vimsh • Shell interface • Low-level interface to VI • Use tab for completion • Powerful command line interface
Advanced troubleshooting: Key principles • Always remember what equipment is supposed to do the VLAN tagging • Always remember what is a L2 infrastructure. A given MAC should only be advertised/used at a single point of the infrastructure. • Always remember what are the failure criteria on a NIC, and how can ESX answer the failure. • Rule out one layer after the other • Several aggregation types are possible • Several types of VLAN tagging are possible (even if VST is preferred) • Several types of physical NICs are supported and use different drivers • Several virtual NICs are available • Virtual NIC features can be individually disabled • Failover can be fine tuned
Advanced troubleshooting: Check the network hint • Every NIC collects a trace of the type of traffic seen on it • The hint is purely informational • Wildly different hints on two cards in the same vSwitch, especially for EST is usually a good sign that both cards are not in the same broadcast domain • Can also be obtained on the command line (see vimsh)
Advanced Troubleshooting: Collecting Network Traces on the vSwitch • Run tcpdump/wireshark/netmon inside a VM or in the Service Console • Traffic visibility depends on the PortGroup policy settings • Allow Promiscuous Mode • VLAN segmentation rules apply • Use VGT by setting VLAN ID to 4095 • Intra VM traffic is captured.
Advanced Troubleshooting: Collecting Network Traces on the vSwitch
Q&A Session ID: TA05 VI3 Networking: Advanced Configurations and Troubleshooting • Jean Lubatti, VMWare • Special thanks to: • Srinivas Neginhal, VMWware • Emiliano Turra, VMWare