260 likes | 422 Views
Scalable Address Resolution for Data Center and Cloud Computing Problem Statements Linda Dunbar ( ldunbar@huawei.com ) Sue Hares ( shares@huawei.com ). Scenarios. Data Center with Large Amount of Virtual Machines
E N D
Scalable Address Resolution for Data Center and Cloud ComputingProblem StatementsLinda Dunbar (ldunbar@huawei.com)Sue Hares (shares@huawei.com)
Scenarios • Data Center with Large Amount of Virtual Machines • Cloud Computing Service (Infrastructure as a Service) which are built on top of a large amount of virtual hosts
Load Balance requires hosts in one Layer 2 • Most traffic through data center have asymmetric traffic pattern: much higher data volume coming out of data center than the data volume going into the data center. • Traffic into the data center are usually requests. Traffic out of the data center are usually replies. • Good load balance design only let the incoming traffic to the Load Balancer which distributes the incoming requests to a cluster of servers (hosts). The reply data from servers can go directly out without going through the Load Balancer. • All the servers in the cluster have same IP addresses. So when any of those servers send reply directly out to the hosts which made the requests, external hosts only see the same IP address regardless which server handles the requests. LoadBalancer L2 ToR switches
Redundancy and Virtual Server Failure Recovery require both Active and Standby Servers (VMs) on same Layer 2 • For redundant servers (VMs) serving same applications, both Active and Standby servers need to be on the same Layer 2 • The Active and Stand-by usually have same IP address. They need to have keep-alive messages between them. The only way to achieve this is via Layer 2 keep-alive because the Active and Stand-by will have different MAC Addresses. • VRRP requires Active and Stand-by routers in the same subnet as all hosts VRRP requires both routers in same Layer 2 as all the hosts ToR switches Active Standby Layer 2 Keep-Alive between Active and Standby
VM Mobility Requires them on one Layer 2 • VMware’s vMotion requires Servers on same Layer 2 in order to move a VM from one server to another because: • VM (Host) needs to maintain same IP address when move from one server to another server • In general, hosts send ARP requests for target IP addresses which are within the same Subnet. For target IP addresses in different subnets, hosts forward the data frames directly to the local default gateway. If a VM is moved from Subnet A to Subnet B, this VM couldn’t even forward data to its default gateway and can’t find MAC addresses for other hosts in subnet A. Subnet A Subnet B ToR switches ToR switches > 1000 VMs > 1000 VMs > 1000 VMs > 1000 VMs > 1000 VMs > 1000 VMs If a VM is moved to from Subnet A to Subnet B, the default gateway for the Subnet B can’t handle the IP address from a VM from Subnet A.
What is hurting when the number of virtual hosts in one data center go beyond tens or hundreds of thousands?
Why Layer 2 networks are usually designed within small size (usually < 200): • Servers/hosts and their applications behavior are unpredictable • They are from variety of vendors. Some of them frequently send ARP and other broadcast messages. • Typical low cost Layer 2 switches don’t have sophisticated features to block broadcast data frames or have policy implemented to limit the flooding and broadcast storm. • Hosts (applications) age out MAC to target IP mapping very frequently. Usually in minutes. • Hosts frequently send out gratuitous ARP • when it does a switch over (active to standby) • When they have software glitch • Our IT department has to purposely divide Layer 2 to smaller size to confine broadcast storm into smaller number of nodes (hosts&switches) Layer 3 Layer 2 ToR switches <50 servers <50 hosts <50 hosts Without VM, traditional Layer 2 network size is limited to the number of servers switches can connect to.
The amount of ARP requests in a data center can be too much burden • As Virtual Machines being added to a Data center, the number of hosts within one data center can easily go beyond 20~30K. • Handling up to 1000~2000 ARP requests per second is almost the high limit for any hosts or ARP server. With more than 20K hosts in one Layer 2 domain, the amount of ARP broadcast messages, plus other broadcast messages such as DHCP, can create too much burden to be handled by hosts (or dedicated ARP server). • Servers/VM can send a lot of ARP. For Microsoft Windows (versions XP and server 2003), the default ARP cache policy is to discard entries that have not been used in at least two minutes, and for cache entries that are in use, to retransmit an ARP request every 10 minutes ToR switches > 1000 VMs > 1000 VMs > 1000 VMs Layer 2 with tens of thousands of hosts
MAC per VNIC (VM) requires big FDB MAC Addresses could be in hundreds of thousands • Hypervisor requires MAC address per VNIC (virtual machine); • MAC per VNIC implies many MAC Addresses; • Result is FDB overflow, flooding, poor performance; • Our IT Dept says this is the real problem in the Data Center e.g., 25 access bridges per core bridge e.g., 25 racks per access bridge e.g., 25 blades per rack e.g., 50 VNICs per blade
Current Status of Bridge’s FDB size: • Typical bridges support in the range of 16 to 32K MAC Addresses, with some supporting 64K; • With external memory (TCAM), bridges can support up to 512K to 1M MAC addresses; • Failure to support sufficient MAC addresses results in increased flooding / poor performance; • Operators may need to upgrade to higher capacity (ie., more FDB entries) bridges OR • If high-capacity bridges are already in use (e.g., in the core), performance may be compromised by excessive flooding.
When FDB doesn’t have enough space for all the hosts MAC • When the number of hosts, MAC addresses, are above the switches FDB size, learnt MAC addresses in the FDB are aged out faster, which will increase the chances of switch’s FDB not having the entry for the received packet’s DA, which then causes the packet being flooded. • When the spanning tree topology changes (e.g. when a link fails), a bridge clears its cached station location information because a topology change could lead to a change in the spanning tree, and packets for a given source may arrive on a different port on the bridge. As a result, during periods of network convergence, network capacity drops significantly as the bridges fall back to flooding for all hosts.
When there are tens of thousands of VMs in one Data Center: • It is necessary to restrain the ARP storm and broadcast storm initiated by (unpredictable) servers and applications into a confined zone. • It is necessary to have a way to restrain Layer 2 switches from learning tens of thousands of MAC addresses. • It is necessary for reduce the amount of un-known addresses arriving at any Layer 2 switches to prevent large amount of un-known broadcast in one Layer 2.
Why VLAN (or smaller subnet) alone is not enough Only the traffic with the same VLAN (#10) as the server will be allowed to go through this port • Subnet (VLAN) can partition one Layer 2 network into many virtual Layer 2. All the broadcast messages are confined within one subnet (VLAN). • Subnet (VLAN) has worked well when each server serving one single host. The server will not receive broadcast messages from hosts in other subnets (VLANs). • When one physical server is supporting >100 Virtual Machines, i.e. >100 hosts, most likely the virtual hosts on one server are on different subnets (VLANs). If there are 50 subnets (VLANs) enabled on the switch port to the server, the server has to handle all the ARP broadcast messages on all 50 subnets (VLANs). The amount of ARP to be processed by each server is still too much. • When virtual hosts are added or deleted from a server, the switch port to the server may end up enabling more VLANs than the number of subnets actually active on the server, causing more ARP to be sent to the server • For Cloud Computing Service (which is explained in later slides), the number of virtual hosts and virtual subnets can be very high. It might not be possible to limit the number of virtual hosts in each subnet. VLAN #10 Host A Without VM, each server only sees traffic on one VLAN IP/MPLS Network interconnect all the Data Centers There could be 50 VLANs or more enabled on this port, making server receiving a lot of broadcast msg
Conclusion for Scenario 1: • RFC826 is not enough.Need a more scalable address resolution protocol for Data Center with tens of thousands virtual machines • Current solutions are not enough or do not fit • VRRP • ARP Server for ATM (done in IETF) • VPLS with BGP autodiscovery • IEEE802.1ah • IEEE802.1aq • TRILL • BEHAVE • See the chart we made • Challenge: • Need a way to establish domains in one Data Center, so that hosts still see all others as in one Layer 2, but their MAC addresses are contained within the domain, i.e. not to be learnt by switches and hosts in other domains.
Scenarios • One Data Center with Virtual Machines implemented • Cloud Computing Service (Infrastructure as a Service)
Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Application Virtual Subnet (Closed User Group) Virtual Subnet Virtual Subnet Virtual Subnet VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM Infrastructure As A Service - Service Model Client can request a Virtual Subnet (with a group of VMs) to run their applications. Client can also request multiple Virtual Subnets. They can define policy among Virtual Subnets. Another client may request a Virtual Network with multiple virtual Subnets. The Client defines policy among their Virtual Subnets Virtual Subnet Service offering Network Infrastructure LA New York Chicago Service Provider owns the Network Infrastructure and Physical Data Center Infrastructure
Virtual Subnets Most likely, it is IP/MPLS networks which interconnect multiple data center sites IP/MPLS Network interconnect all the Data Centers
Virtual Subnet • Cloud Computing service needs Network to provide Virtual Subnets and Virtual hosts • Virtual hosts within one virtual subnet can span across different sites due to customer requirement or resource allocation • Some virtual subnets can only be connected by Private Networks (layer 2 or/and layer 3)
Identity to physical IP/MAC address resolution • Virtual hosts come and go. But each service provider’s IP/MAC address space is no unlimited. • Therefore, it is important to have a scalable Address Resolution protocol to map virtual host’s identity to physical IP/MAC addresses.
Virtual Host’s identity • Virtual host identity has be associated with its Virtual Network • Similar to IP to Ethernet MAC ARP (RFC826), Cloud Computing service needs “Virtual Host Identity” to IP/MAC address resolution
Could we do it all in Layer 2, ignore IP/MPLS, or vise versa? • Answer is NO because: • Client may request redundancy across two geographical locations. They may want two VMs in one Virtual Subnet to be in two locations Most likely it is the IP/MPLS networks which interconnect multiple locations • The two redundant VMs may have same IP address with one being Active and other one being Standby. The Active and Stand-by need to have keep-alive messages between them. The only way to achieve this is via Layer 2 keep-alive because the Active and Stand-by will have different MAC Addresses Require the two VMs on same Layer 2 • It is desirable for all hosts (VMs) of one Virtual Subnet to be in one Layer 2 network for efficient multicast and broadcast among them. • Client may request hosts (VMs) in one Virtual Subnet to be on different locations to have faster response for their applications. Require IP/MPLS to interconnect • Hosts can be added to one Virtual Subnet at different time. It is possible that newly added hosts have to be placed at a different site due to computing & storage resource availability Require IP/MPLS to interconnect.
Conclusion for Scenario 2: • Need a more scalable address resolution protocol for Layer 2 Network which spans across multiple locations. • Need a way to constrain MAC addresses in each site from being learnt by other sites. This is to allow traditional Layer 2 switches, which have limited amount of address space for forwarding table, to function properly and to minimize unknown broadcast by those switches.
Proposal to IETF • Create a new IETF working group (http://trac.tools.ietf.org/bof/trac/wiki/BarBofsIETF78Arp222) • To develop scalable address resolution protocols for Cloud Computing Service with large amount of virtual subnets & virtual hosts, and for data center with large amount of virtual hosts, including • Proper identity for virtual subnets and virtual hosts • Address resolution protocols to resolve physical IP/MAC addresses from virtual hosts identities. • To develop network solutions to allow virtual hosts see other virtual hosts as if they are on one subnet, but the network interconnect them can be anything (Layer 2 or/and Layer 3). • To develop solutions to scope the broadcast messages, including ARP and DHCP, so that broadcast storm are confined to smaller zones. • To develop solutions of handling of multicast messages among virtual hosts in one Virtual Subnet which spans across multiple locations.