270 likes | 408 Views
Marrying OpenStack and Bare-Metal Cloud. Pete Lumbis & David Iles. May 2018. Our Speakers. Pete Lumbis Technical Marketing Engineer Cumulus Networks. David Iles Senior Director – Ethernet Switching Mellanox Technologies. Multi-Tenant Networks. Multi-tenant is more challenging
E N D
Marrying OpenStack and Bare-Metal Cloud Pete Lumbis & David Iles May 2018
Our Speakers Pete Lumbis Technical Marketing Engineer Cumulus Networks David Iles Senior Director – Ethernet Switching Mellanox Technologies
Multi-Tenant Networks • Multi-tenant is more challenging • Possible competitor tenants (mixing these is real bad) • Shared infrastructure • Multiple security policies • Single tenant private cloud is easy • Single ownership • Single security policy
Multi-Tenant Networks • VLANs • Isolate traffic with network tags (802.1q) over shared physical links • Only Layer 3 Routers can cross VLANs • ML2 can provision VLANs on compute or physical switches ML2 VLAN Provisioning
VLANs Are Not Your Friend • Pros: • Solve multi tenancy networking
VLANs Are Not Your Friend • Pros: • Solve multi tenancy networking • Cons: • Single network path
VLANs Are Not Your Friend • Pros: • Solve multi tenancy networking • Cons: • Single network path • Low scalability • Scale-Up networking, not scale-out • Limited VLAN range (4096 at best)
VLANs Are Not Your Friend • Pros: • Solve multi tenancy networking • Cons: • Single network path • Low scalability • Scale-Up networking, not scale-out • Limited VLAN range (4096 at best) • Large blast Radius Impacted Server NIC Failure
Beyond VLANs: VxLAN • Same L2 extension and isolation as VLANs (VxLAN ID) • Operate over L3 network • Small blast radius • Enables scale-out networking • Extremely high bandwidth network • Smaller ML2 Footprint ML2 VxLAN Provisioning
Compute Based VxLAN • VxLAN from compute to compute • No VxLAN based switches • BGP to the server for simplicity • Same config, troubleshooting for both network and compute • No Layer 2 • BGP advertises compute endpoint (VTEP) • ML2 provisions VxLANs per tenant ML2 VxLAN Provisioning
Compute Based VxLAN – The Ugly • VxLAN capable NICs required • Most servers have them, but not all. • Performance hit without NIC offload • BGP to the server is scary • It’s not really, but not everyone likes the idea • Network doesn’t want server folks touching their network • Server folks don’t want to learn BGP • Complicated solution for smaller deployments • Difficult for Ironic (bare metal)
Network Design Recap • VLAN Based • Bad, don’t do it • Even at a few racks • VxLAN Compute + Network Based • Great, but still requires network ML2 • Loss of a switch == loss of a lot of Neutron state • Still requires VxLAN NICs • VxLAN Compute Only • Best solution if you are okay with BGP on servers • Still requires VxLAN NICs • Not the easiest for Ironic
Network Design Needs • Stability • It isn’t cloud if one host brings down the others • Simplicity • I don’t need the Avengers to run the infrastructure • Flexibility • Works for bare metal and virtual workloads • Scalability • Maybe not 1000s but 100s of tenants can be required • Affordable • VxLAN NICs may not be an option
The Answer… • VLAN based compute • Wait…. • + VxLAN based network • But it’s not Hierarchical Port Binding • HPB manages both VxLAN and VLANs • ML2 only drives compute VLANs • Network is pre-provisioned • Offload VxLAN to the network • No need for special NICs • Localized VLANs easily and safely scales to 100s of tenants • Larger scale requires more complex solutions
VLAN + VxLAN • VxLANs pre-configured between all switches • Compute facing network ports have VLAN trunks configured • ML2 provisions VLANs on compute nodes as needed • Every VLAN is mapped to an identical VxLAN Tunnel (VNI) ML2 VLAN Provisioning VxLANs VLANs
VxLAN Without a Controller? • Switches need VxLAN knowledge • WHO: else is a VxLAN end point (VTEP) • WHAT: VxLAN tunnels exist on each VTEP • WHERE: do MAC addresses live • Option 1: VxLAN Controllers • Openstack ML2 • OpenContrail/Tungsten Fabric • Open Daylight • Option 2: EVPN • Switches exchange data without a controller • Relies on extensions to BGP • Multi-vendor (Cumulus, Cisco, Arista, Juniper) VxLAN Information Controller VxLAN Information
EVPN: A Closer Look • Switch build BGP relationships BGP A B
EVPN: A Closer Look • Switch build BGP relationships • Switch learns MAC addresses from servers • Just like a normal switch I know about the MAC for server B! I know about the MAC for server A! A B
EVPN: A Closer Look • Switch build BGP relationships • Switch learns MAC addresses from servers • Just like a normal switch • MAC information exchanged via BGP Come to me to reach MAC A Come to me to reach MAC B A B
EVPN: A Closer Look • Switch build BGP relationships • Switch learns MAC addresses from servers • Just like a normal switch • MAC information exchanged via BGP • Data sent via VxLAN based on BGP information From: Switch A To: Switch B (VxLAN) From: Server A To: Server B A B
EVPN: A Closer Look • Switch build BGP relationships • Switch learns MAC addresses from servers • Just like a normal switch • MAC information exchanged via BGP • Data sent via VxLAN based on BGP information • Since BGP is used, no shared failures like L2 A B
BaGPipe – A Potential Future • Neutron work with BGP called BaGPipe • Two Goals: • Inter-DC VPNs • Layer 3 to the compute node • Can be done today with Free Range Routing + Neutron Networking • What was described earlier • BaGPipe would have Neutron control BGP + VxLAN • Today’s solution Neutron only controls VxLAN • Nearly identical to EVPN on the server • Extremely early days, but value is clear • Early for BaGPipe as well
How do we make it work with next gen workloads? Machine Learning and NVME Fabrics Next Generation Storage: • All Flash • PCIe attached NVME drives • RDMA over Ethernet – RoCE Both must run over an Overlay network • RoCE + VXLAN Machine Learning Applications • GPU accelerated • PCIe attached GPU’s • RDMA over Ethernet - RoCE ML2 VxLAN Provisioning
EVPN Gotcha’s • License-free Features: BGP, VXLAN, ZTP, EVPN • VXLAN Routing in Hardware • No Loopback cables • VTEP Scale • Many switches max out at 128 VTEPs • ROCE over VXLAN • NVME over Fabric • Machine Learning