290 likes | 510 Views
SDN Performance & Architecture Evaluation. Vijay Seshadri. Cloud Platform Engineering (CPE), Symantec. Jason Venner. Mirantis. Agenda. CPE Overview. 1. SDN Objectives and Test Plan. 2. Test Setup and Framework. 3. Test Results. 4. Key Findings/Insights. 5. 2. CPE Overview.
E N D
SDN Performance & Architecture Evaluation Vijay Seshadri Cloud Platform Engineering (CPE), Symantec Jason Venner Mirantis
Agenda CPE Overview 1 SDN Objectives and Test Plan 2 Test Setup and Framework 3 Test Results 4 Key Findings/Insights 5 2
CPE Overview • CPE Charter • Build a consolidated cloud infrastructure that offers platform services to host Symantec cloud applications • Symantec cloud infrastructure already hosting diverse (security, data management) workloads • Analytics – Reputation based security, managed security services • Storage – Consumer and Enterprise backup/archival • Network – Hosted email & web security • We are building an OpenStack based platform that provides additional storage and analytics services • Secure multi-tenancy a core objective for all services
CPE Platform Architecture CLIs Cloud Applications Scripts Web Portal REST/JSON API Supporting Services Core Services Identity & Access (Keystone) Monitoring Compute (Nova) SDN (Neutron) Object Store Batch Analytics Msg Queue User Mgmt Image (Glance) Load Balancing K/V Store Stream Processing Mem Cache Logging Authn DNS SQL Email Relay Metering Roles SSL Deployment Tenancy Compute Networking Storage Big Data Messaging Quotas
CPE Overview 1 SDN Objectives and Test Plan 2 Test Setup and Framework 3 Test Results 4 Key Findings/Insights 5 2
SDN Objectives • Provide secure multi-tenancy using strong network isolation • Policy driven network access control within (and across) projects/domains • Provide an ability for product engineering teams to define a network topology via REST APIs • Associate network objects dynamically with VMs, Projects (Tenants) • Create and manage network access control policies within and across projects • Enables easier integration of applications on partner infrastructure • Interconnect OpenStack with bare metal storage/analytics services • East-West traffic needs to transit the underlay network • Support software driven network functions • LBaaS, DNSaaSetc
Test Plan • Secure Multi Tenancy • Test network isolation under various configurations • Same Subnet • Same network, different subnets • Different networks, different tenants • Different networks, identical IP addresses • Test enforcement of network policies • “Deny All” works and is the default • “Allow Specific” works • Removal of an “Allow Specific” works • Data Plane Performance • OpenStack Internal • N x N Inter VM communication • Client-Server TCP mesh using Iperf3
Test Plan – Cont’d • Egress/Ingress • Simulate data ingress/egress to non-OpenStack services (E.g Bare metal Hadoop cluster) • Control Plane scalability • Where does the solution break? Verify correct operation at the limit • Test rate of creation and resource limits for neutron objects • Number of Network Ports – VNIC’s • Number of Networks and Subnets • Number of Routers • Number of active flows • Number of connections through the external Gateway (Ingress/Egress)
CPE Overview 1 SDN Objectives and Test Plan 2 Test Setup and Framework 3 Test Results 4 Key Findings/Insights 5 2
Overview of SDN solutions • OpenStack is about networking • OVS/VLAN • 4k limit on VLANS • Many VLAN’s spanning many TORs are operationally challenging, especially in rapidly changing environments • No Failure Isolation Domain • Overlay • Simple Physical Network, each TOR L2 island • L3 between the TORS • Packet encapsulation for VM Traffic • Controllers orchestrate tunnel mesh for VM traffic
Test Framework Overview • We build a pool of VM’s each placed into a specific Rack • We have a control network we use to launch test cases and collect results
NSX Test Setup: OpenStack Internal 3 OpenSack Controllers NSX Services 4 NSX Controllers 4 NSX Gateways
NSX Test Setup: Ingress/Egress 3 OpenSack Controller 20 NSX Gateways Not Hypervisors 20 Outsiide hosts Traffic Source Sink 4 NSX Controllers
Contrail Test Setup: OpenStack Internal 1 OpenSack Controller Stole the TOR ports for the MXs 4 servers 2x10 Used 3 servers Conrollers
Contrail Test Setup: Ingress/Egress 1 OpenSack Controller 20 OutSide Traffic Source Sink Hypervisors 4 servers 2x10 Ports Used
CPE Overview 1 SDN Objectives and Test Plan 2 Test Setup and Framework 3 Test Results 4 Key Findings/Insights 5 2
Multi-tenancy Test Results • Connectivity within a subnet • Both solutions proved isolation • Contrail 1.04 was not “deny all” by default • Same network, different subnets • Both solutions proved isolation
Multi-tenancy Test Results • Different networks in different projects • Both solutions proved isolation • Contrail used floating IPs • Different network overlapping IP addresses • Both solutions proved isolation
Data Plane Test Results: OpenStack Internal • Both overlay solutions added 4 % payload overhead • This is in addition to 3% underlay frame overhead • Both solutions ran close to ‘virtual’ wire speed. • We hit 75Gb/sec per TOR out of 80Gb/sec • Peak across the TORS 225Gb/sec out of 240Gb/sec • Traversing SDN routers had little impact on peak performance • Neutron OVS/VLAN required jumbo frames to hit wire speed
Data Plane Test Results: Ingress/Egress • On a Per VM Basis we ran at virtual wire speed to/from our external hosts • OVS bugs in OVS 2.1 limited the performance of individual connections under high concurrency (>10,000) • Saturation of our TOR to Spine connection was the principle performance limit • Gateway traffic required 2 passes through our Gateway TOR – one busy TOR • Saturated the Gateway TOR, data payload at 37 Gb/sec • Tested up to 100,000 concurrent connections • OVS performance fell off after 10,000 • vRouter was able to scale till 100,000
CPE Overview 1 SDN Objectives and Test Plan 2 Test Setup and Framework 3 Test Results 4 Key Findings/Insights 5 2
Key Findings: OpenStack configuration • Building a 100+ node OpenStack cluster requires extensive config tuning • Host OS, MySQL, DB Indexes and Rabbit tuning • Horizontal scaling of API servers • API service thread counts, connection pools and queues • Python REST servers are not the ideal for handling 10,000’s of requests per second • Keystone configuration • Memory cache for tokens • Short lived tokens with regular purge • Havana Neutron implementation does not meet our scalability goals • 8 minutes+ to turn up a network port, on a network with a few hundred VM’s • Long delays with the query API’s when there are a few thousand VM’s
Key findings: VM Network Performance • Tune your VM kernels • With the default Ubuntu 12.04 or CentOS 6.5 kernel settings for network buffer space and TCP window sizes we see 1.8Gb/sec/VM with 1500byte MTU. • With standard 10G tuning for rmem/wmem and interface txqueuelen
Conclusion • SDN is a core capability for us to offer a secure multi-tenant cloud platform • Overlay solutions provide a strong network isolation and access control • Our use case requires extensive traffic into and out of the SDN zone • NSX requires host servers and additional network configuration • Contrail uses MPLS enabled routers, more integrated with underlay infrastructure • Both overlay solutions met our short term performance and scalability goals • We will continue to evaluate the SDN space for solutions that meet our long term goals