140 likes | 156 Views
This presentation explores optimizing NFV performance with real-time requirements, tuning OpenStack settings, addressing bottlenecks, and future enhancements. Learn the strategies to enhance SBC performance and ensure low latency in Telco NFV environments.
E N D
SECRETS FOR APPROACHING BARE-METAL PERFORMANCE WITH REAL-TIME NFV Souvik Dey Principal Software Engineer Suyash Karmarkar Principal Software Engineer OpenStack Summit - Sydney, Nov 7th 2017 - Lightning talk
Agenda • What is SBC • Performance testing of an SBC NFV • Performance Requirements of an SBC NFV • Performance bottlenecks • Performance gains by tuning • Guest level tunings • Openstack tunings to address bottlenecks (CPU, Memory) • Networking choices : Enterprise workloads/carrier workloads • Virtio • SR-IOV • OVS-DPDK • Future/Roadmap items
SBC is - Compute, Network and I/O Intensive NFV SBC sits at the Border of Networks and acts as an Interworking Element, Demarcation point, Centralized Routing database, Firewall and Traffic Cop
Performance Requirements of an SBC NFV • Guarantee Ensure application response time. • Low Latency and JitterPre-defined constraints dictate throughput and capacity for a given VM configuration. • DeterministicRTC demands predictive performance. • OptimizedTuning OpenStack parameters to reduce latency has positive impact on throughput and capacity. • Packet LossZero Packet Loss so the quality of RT traffic is maintained.
Performance Bottlenecks in Openstack The Major Attributes which Govern Performance and Deterministic behavior • CPU - Sharing with variable VNF loadsThe Virtual CPU in the Guest VM runs as Qemu threads on the Compute Host which are treated as normal processes in the Host. This threads can be scheduled in any physical core which increases cache misses hampering performance. Features like CPU pinning helps in reducing the hit. • Memory - Small Memory Pages coming from different socketsThe virtual memory can get allocated from any NUMA node, and in cases where the memory and the cpu/nic is from different NUMA, the data needs to traverse the QPI links increasing I/O latency. Also TLB misses due to small kernel memory page sizes increases Hypervisor overhead. NUMA Awareness and Hugepages helps in minimizing the effects • Network - Throughput and Latency for small packets The network traffic coming into the Compute Host physical NICs needs to be copied to the tap devices by the emulator threads which is passed to the guest. This increases network latency and induces packet drops. Introduction of SR-IOV and OVS-DPDK helps the cause. • Hypervisor/BIOS Settings - Overhead, eliminate interrupts, prevent preemptionAny interrupts raised by the Guest to the host results in VM entry and exit calls increasing the overhead of the hypervisor. Host OS tuning helps in reducing the overhead.
Performance tuning for VNF(Guest) • Isolate cores for Fast Path Traffic, Slow Path Traffic and OAM. • Use of Poll Mode Drivers for Network Traffic • DPDK • PF-RING • Use HugePages for DPDK Threads • Do Proper Sizing of VNF Based on WorkLoad.
PERFORMANCE GAIN WITH CONFIG CHANGES and Optimized NFV • Enable CPU Pinning • Configure libvirt to expose the host CPU features to the guest • Enable ComputeFilter Nova scheduler filter • Remove CPU OverCommit • CPU Topology of the Guest • Segregate real-time and non real-time workloads to different computes using host aggregates • Isolate Host processes from running on pinned CPU • Enable NUMA Awareness • Enable Hugepages on the host for Guest Memory. • Extend Nova scheduler with the NUMA topology filter • Remove Memory OverCommit
Networks in OpenStack VNF with Open vswitch (kernel datapath) VNF with OVS-DPDK (DPDK datapath) VNF with SR-IOV Single-Root IO Virtualization Up to 50kpps Up to 4Mpps per socket* *Lack of NUMA Awareness Up to 21 Mpps per core User space PF1 PF2 Kernel space
Host Tunables for Performance - Kernel configuration • Kernel Tuning • The “cpu-partitioning” profile will also tune the kernel to • Remove read-copy-update work from isolated CPUs • Reduce timer tick to isolated CPUs (when busy) from 1000 to 1/second • For best performing 0-packet loss, also use “isolcpus” boot parameter • Disable KSM (Kernel Sharable Memory)
Future/Roadmap Items • Configuring the txqueuelen of tap devices in case of OVS ML2 plugins: • https://blueprints.launchpad.net/neutron/+spec/txqueuelen-configuration-on-tap • Isolate Emulator threads to different cores than the vCPU pinned cores: • https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policy • SR-IOV Trusted VF: • https://blueprints.launchpad.net/nova/+spec/sriov-trusted-vfs • Accelerated devices ( GPU/FPGA/QAT) & Smart NICs. • https://blueprints.launchpad.net/horizon/+spec/pci-stats-in-horizon • https://blueprints.launchpad.net/nova/+spec/pci-extra-info • SR-IOV Numa Awareness • https://blueprints.launchpad.net/nova/+spec/reserve-numa-with-pci
Q & A More Details : https://www.openstack.org/summit/sydney-2017/summit-schedule/events/20538/secrets-for-approaching-bare-metal-performance-with-real-time-virtual-network-functions-in-openstack
Thank You Contact: skarmarkar@sonusnet.com sodey@sonusnet.com