290 likes | 712 Views
用 Nexus 设计 数据中心 - Deploying OTV in Datacenter. Agenda. OTV 介绍 OTV 典型部署模式 路径优化( Path Optimization ). 数据中心二 层扩展需求. 业务需求 Disaster Avoidance Business Continuance Workload mobility 多点数据中心 灾备中心如 2 地 3 中心 原有数据中心由于早期设计机房空间、电力、制冷、性能容量的限制,需要新增数据中心灵活扩展
E N D
用Nexus 设计数据中心 -Deploying OTV in Datacenter
Agenda • OTV介绍 • OTV典型部署模式 • 路径优化(Path Optimization)
数据中心二层扩展需求 • 业务需求 • Disaster Avoidance • Business Continuance • Workload mobility • 多点数据中心 • 灾备中心如2地3中心 • 原有数据中心由于早期设计机房空间、电力、制冷、性能容量的限制,需要新增数据中心灵活扩展 • 建多点物理位置分散的数据中心提供更高可靠性保障,同时实现用户访问的流量更好的在数据中心之间分担,获得更好的访问性能
Traditional Layer 2 Extension EoMPLS Dark Fiber VPLS
Overlay Transport Virtualization (OTV) OTV is a “MAC in IP” technique to extend Layer 2 domains OVER ANY TRANSPORT Overlay - A solution that is independent of the infrastructure technology and services, flexible over various inter-connect facilities O T Transport - Transporting servicesfor layer 2 Ethernet and IP traffic V Virtualization - Providesvirtual stateless multi-access connections
4 2 3 3 4 1 OTV Control PlaneMAC Address Advertisements (Multicast-Enabled Transport) Every time an Edge Device learns a new MAC address, the OTV control plane will advertise it together with its associated VLAN IDs and IP next hop. The IP next hops are the addresses of the Edge Devices through which these MACs addresses are reachable in the core. A single OTV update can contain multiple MAC addresses for different VLANs. A single update reaches all neighbors, as it is encapsulated in the same ASM multicast group used for the neighbor discovery. Core OTV update is replicated by the core OTV Update OTV Update OTV Update East IP B IP A West IP C South-East
4 2 1 5 OTV Data Plane: Inter-Site Packet Flow • The Edge Device on site East receives and decapsulates the packet. • Layer 2 lookup on the original frame. MAC 3 is a local MAC. • The frame is delivered to the destination. Layer 2 lookup on the destination MAC. MAC 3 is reachable through IP B. The Edge Device encapsulates the frame. The transport delivers the packet to the Edge Device on site East. OTV Layer 2 Lookup Layer 2 Lookup 3 Transport Infrastructure Decap IP A IP B OTV OTV OTV OTV Encap MAC 1 MAC 3 IP A IP B IP A IP B MAC 1 MAC 3 MAC 1 MAC 3 MAC 1 MAC 3 6 West Site East Site MAC 1 MAC 1 MAC 3 MAC 3
4 3.2 1 2 3.1 OTV Data Plane: Multicast DataMulticast State Creation The multicast receivers for the multicast group “Gs” on the East site send IGMP reports to join the multicast group. The Edge Device (ED) snoops these IGMP reports, but it doesn’t forward them. Upon snooping the IGMP reports, the ED does two things: • Announces the receivers in a Group-Membership Update (GM-Update) to all EDs. • Sends an IGMPv3 report to join the (IP A, Gd) group in the core. On reception of the GM-Update, the source ED will add the overlay interface to the appropriate multicast Outbound Interface List (OIL). SSMTree for Gd Multicast-enabled Transport GM-Update IGMPv3 report to join (IP A, Gd) , the SSM group in the Core. Receive GM-Update Update OIL Client IGMP snoop ClientIGMP report to join Gs OTV OTV OTV IP B Source Receiver IP A West East From Right to Left • It is important to clarify that the edge devices join the core multicast groups as hosts, not as routers!
3 4 1 5 5 4 2 OTV Data Plane: Multicast DataMulticast Packet Flow Multicast-enabled Transport OTV OTV OTV Transport Replication IP B Lookup IP A Gd IP A Gd Source IP A Gd IP AGd IP s Gs IPsGs IPsGs IPsGs IPsGs Receiver IP A IPsGs West East IP C OTV IPsGs Receiver Decap South Decap Encap
OTV Control PlaneNeighbor Discovery (Unicast-Only Transport) One of the OTV Edge Devices (ED) is configured as an Adjacency Server (AS)*. All EDs are configured to register to the AS: send their site-id and IP address. The AS builds a list of neighbor IP addresses: overlay Neighbor List (oNL). The AS unicasts the oNLto every neighbor. Each node unicasts hellos and updates to every neighbor in the oNL. Site 2 Site 3 Unicast-Only Transport IP C Site3, IP C IP B Site2, IP B Site 1 oNL • Site 1, IP A oNL • Site 2, IP B oNL • Site 3, IP C oNL • Site 4, IP D oNL IP A • Site 5, IP E Adjacency Server Mode IP E Site4, IP D IP D Site5, IP E Site 5 Site 4 * A redundant pair may be configured
OTV Encapsulation Consideration • OTV adds a 42 Byte IP encapsulation • The OTV shim header contains VLAN ID, Overlay number and CoS • The OTV Edge Devices do NOT perform packet fragmenting and reassembling. A packet failing the MTU is dropped by the Forwarding Engine • Make sure that [xB + 42B] < DCI MTU… where x = Size of original packet 802.1Q DMAC SMAC Eth Payload 802.1Q VLANID, CoS CoS Ether Type IP Header DMAC SMAC CRC OTV Shim VLAN 6B 6B 2B 20B 8B Original Frame 4B ToS 42 Byte encapsulation
OTV Automated Multi-homingPer-VLAN Load Balancing • The detection of the multi-homing is fully automated and it does not require additional protocols and configuration • The Edge Devices within a site discover each other over the “otv site vlan”. • In each site OTV elects one of the Edge Devices to be the Authoritative Edge Device (AED) for a subset of the extended VLANs • In a dual-homed site the VLANs will be split in odd and even VLANs • The AED: • forwards traffic to and from the overlay • advertises MAC addresses for any given site/VLAN OTV OTV OTV OTV AED AED Transport OTV IP A IP B AED AED
OTV Layer 2 Fault Isolation STP isolation – No configuration required No BPDUs forwarded across the overlay STP remains local to each site Edge device internal interfaces behave as any other switchport Unknown unicast isolation – No configuration required No unknown unicast frames flooded onto the overlay Assumption is that end stations are not silent Option for selective unknown unicast flooding (for certain applications) Proxy ARP cache for remote-site hosts – On by default On ARP request for remote host, request forwarded through OTV and initial ARP reply generated by that host OTV edge device snoops ARP replies and caches data Subsequent ARP replies proxied by local OTV edge device using ARP cache
Local MAC = Blue Remote MAC = Red MAC Mobility Server Moves MAC X MAC X OTV East West MAC X MAC X MAC X AED • AED detects MAC X is now local. AED AED advertises MAC X with a metric of zero MAC X MAC X OTV East West MAC X MAC X MAC X AED AED OTV OTV OTV OTV OTV OTV OTV OTV OTV OTV OTV OTV EDs in site West see MAC X advertisement with a better metric from site East and change them to remote MAC address. MAC X OTV MAC X East West MAC X MAC X AED MAC X AED
OTVVDC Models OTV VDC • Two different deployment models are considered for the OTVVDC: • OTV Appliance on a Stick • Inline OTV Appliance Join Interface Common Uplinks to Transport For Layer3 and DCI Internal Interface Dedicated Uplink for DCI Uplinks to the Layer3 Transport SVIs SVIs L3 L3 L2 L2 OTV VDC OTV VDC OTV Appliance on a Stick Inline OTV Appliance • No difference in OTV functionality between the two models • The Inline OTV Appliance requires availability of Core downstream links
OTV Edge Device at the Aggregation • OTV at the Aggregation w/ L2-L3 Boundary • DC Core performs only Layer 3 role • ARP, STP and unknown unicast domains isolated between PODs • Inter or Intra-DC LAN extension provided by OTV • Ideal for single aggregation block topology OTVto remote sites Join Interface Recommended for Greenfield Internal Interface Virtual Overlay Interface Core OTVVDC OTVVDC OTVVDC OTVVDC SVIs SVIs SVIs SVIs VPC VPC Aggregation Access
OTV Edge Device at the Core OTV at the DC Core with L2–L3 boundary at the Aggregation Option 1 – Dedicated devices to perform OTV • Physical devices or VDCs carved outfrom the Nexus 7000 deployed in the core • Separated infrastructure to provide Layer 2 extension and Layer 3 connectivity services • VLANs extended from Agg Layer • Recommended to use separate physical links for L2 & L3 traffic • Loop-free hub-and-spoke Layer 2 topology Easy deployment for Brownfield Dedicated Uplinks for DCI OTV to remote sites Dedicated Uplinks for Layer 3 VPC OTV OTV VPC VSS L3 VPC L2 Aggregation Access
OTV Edge Device at the Core OTV at the DC Core with L2–L3 boundary at the Aggregation Option2 – Common Devices for DCI and Layer 3 • Easy deployment for brownfields • DC Core devices perform Layer 3 and OTV functionalities • HSRP Localization at each POD • VLANs extended from Agg Layer • Recommended to use separate physical links for L2 & L3 traffic • Loop-free hub-and-spoke Layer 2 topology • STP and L2 broadcast Domains not isolated between PODs Easy deployment for Brownfield OTV toremote sites Common Uplinks for DCI and Layer 3 OTV OTV Core VPC Carries Only the OTV extended VLAN Carries Only the OTV extended VLAN VPC VSS L3 VPC L2 Aggregation Access
Deploy OTV at the Core • OTV at the DC Core with L2–L3 boundary at the Core • Easy deployment for Brownfield • L2-L3 boundary in the DC core • DC Core devices performs L2, L3 and OTV functionalities • Requires a dedicated OTVVDCinto core Nexus • OTV deployed in the DC core to provide LAN extension services to remote sites • Intra-DC LAN extension provided by bridging through the Core • VSS/vPC recommended to create an STPloopless topology • Storm-control between PODs
OTVVDC Two possible approaches AED AED DCI Edge Layer DCI Edge Layer N7K1-VDCB N7K2-VDCB N7K1-VDCB N7K2-VDCB Warning Aggregation Layer Aggregation Layer N7K1-VDCA N7K2-VDCA N7K1-VDCA N7K2-VDCA • Single vPC Layer at the Aggregation. • Provides good level of resiliency with the minimum amount of ports. • DCItraffic is always forwarded directly to the OTV AED device (mac-address-table) • Only AED forwards the traffic to and from OTV Overlay • DCI traffic hashed to OTV Edge (non-AED) device will have to traverse the vPC Peer-Link between the two DCI Edge switches
Path Optimization The approach is to use the same HSRP group in all sites and therefore provide the same default gateway MAC address. Each site pretends that it is the sole existing one, and provide optimal egress routing of traffic locally. OTV achieves Edge Routing Localization by filtering the HSRP hello messages between the sites, therefore limiting the “view” of what other routers are present within the VLAN. ARP requests are intercepted at the OTV edge to ensure the replies are from the local active GWY. Egress Routing Localization – OTV Solution Active GWY Site 1 Active GWY Site 2 L3 L2 FHRP Hellos FHRP Hellos ARP traffic is kept local ARP traffic is kept local East West
ip access-list hsrp 10 permit udp any 224.0.0.2/32 eq 1985 20 permit udp any 224.0.0.102/32 eq 1985 ip access-list all-ips 10 permit ip any any vlan access-map hsrp-localize 10 match ip address hsrp action drop vlan access-map hsrp-localize 20 match ip address all-ips action forward vlan filter hsrp-localize vlan-list <OTV-VLANs> mac-list hsrp-vmacseq 10 deny 0000.0c07.ac00ffff.ffff.ff00 mac-list hsrp-vmacseq 20 deny 0000.0c9f.f000ffff.ffff.f000 mac-list hsrp-vmacseq 20 permit 0000.0000.0000 0000.0000.0000 route-map hsrp-filter permit 10 match mac-list hsrp-vmac otv-isis default vpn overlay<#> redistribute filter route-map hsrp-filter Filtering Configuration for HSRP Localization To be applied in the OTVVDC Step 1: VACL Option or Port ACL Option HSRPv2 HSRPv2 ip access-list otv-hsrp-filter 10 deny udp any 224.0.0.2/32 eq1985 20 deny udp any 224.0.0.102/32 eq 1985 20 permit ip any any interface x/y description [ OTV internal interfacs] ip port access-group otv-hsrp-filter Filters HSRP packets in OTV VDC Step2: Filters VIP MAC advertisements in OTV HSRPv2
Distributed Workload Mobility State Created Outbound Traffic with Services SNAT • FHRP localization is not possible, because request and reply need to pass through the same service device pair • Source NAT for symmetric flow • Traffic incurs DCI latency Firewall LB Firewall LB N7K1-VDCA N7K2-VDCA N7K4-VDCA N7K3-VDCA DCI Before vMotion After vMotion LD vMotion
Distributed Workload Mobility Inbound Traffic using RHI RHI • Route Health Injection makes use of ACE Load Balancer to inject /32 host route once Virtual Machine moves /32 Load Balancer Load Balancer N7K2-VDCA N7K1-VDCA N7K4-VDCA N7K3-VDCA DCI LD vMotion Before vMotion After vMotion
D 1 3 C B A 2 Path OptimizationIngress Routing Optimization with LISP • End-point host ID (EID) • Route Locator (RLOC) • Ingress Tunnel Router (ITR) • Egress Tunnel Router (ETR) • ITR consults directory to get Route Locator (RLOC) for the destination End-point ID (EID) • ITR IPinIP encapsulates traffic to send it to the RLOC address • ETRs receive and decapsulate traffic IP_DA = 10.10.10.1 Ingress Tunnel Router (ITR) Here RLOC routes only Core IP_DA= A IP_DA = 10.10.10.1 OTV • Granular reachability information for hosts in extended subnet • If a host moves, its mapping is updated • No end-host state in routing tables RLOCs: Egress TR (ETR) Pod N IP_DA = 10.10.10.1 Pod A … EIDs: .5 .6 .7 .8 10.10.10.1 .2 .3 .4 Encap Decap Extended Subnet (10.10.10.0 /24)
OTV在企业网的应用 • 部门位置分散,需要按照部门划分VLAN • 在园区移动办公 • 网络迁移 • 集团单位骨干网为下属单位提供二层通道 • 等等
Challenges with LAN ExtensionsReal Problems Solved by OTV North Data Center • Extensions over any transport (IP, MPLS) • Failure boundary preservation • Site independence / isolation • Optimal BW utilization (no head-end replication) • Resiliency/multihoming • Built-in end-to-end loop prevention • Multisite connectivity (inter and intra DC) • Scalability • VLANs, sites, MACs • ARP, broadcasts/floods • Operations simplicity LAN Extension Fault Domain Fault Domain Fault Domain Fault Domain Only 5 CLI commands South Data Center
OTV现阶段不足之处 • IETF draft,还未形成正式标准 • Convergence time(3s-30s) • 目前支持的Site比较少,不适合汇聚层的部署 • SVI limitation • 目前Per-VLANAED流量负载平衡问题 • 目前backbone必须支持组播