360 likes | 374 Views
Explore NFV mechanisms to optimize VNF performance, mitigate noisy neighbor issues, and achieve efficient VNF placement and selection. Learn about DPDK, OpenNetVM, Intel RDT, and more.
E N D
Design of High Performance Virtual NetworkFunctions for 5G and Beyond Bheemarjuna Reddy Tamma Dept. of Computer Science & Engineering IIT Hyderabad 1st International Conference on Software Defined Networking (ICSDN) August 9-10, 2019
Outline • Importance of NFV and its drawbacks • Mechanisms to improve performance of VNFs • DPDK • OpenNetVM • Use case: ONVM-5G • Noisy neighbour problem in Cloud platforms • Mechanisms to mitigate performance interference due to noisy neighbours • Intel RDT (Resource Director Technology) • Efficient Placement and Selection of VNFs • Research Directions • Conclusions
Network Functions Virtualization (NFV) • Physical NFs (vendor-specific Hardware appliances) • Virtual NFs (Software instances on shared Commodity servers) • Easy to deploy/manage, agility, scalability, dynamic orchestration • Reduction in CapEx & OpEx to service providers • AT&T: “55% of network functions have been virtualized” NFV approach Classic approach
Service Function Chaining (SFC) • ISPs and Telcos offer a diverse set of services to users • Traffic of each service required to pass through and processed by a set of ordered NFs or SFs called Service Function Chain (SFC) • Each SFC request has some specific requirements such as throughput and end-to-end latency • SDN and NFV provide flexibility and agility for deploying SFCs as VNFs on VMs/Containers on Cloud platform Fig. 1:SFCs in SGi-LAN of 4G
ETSI NFV MANO Framework • NFV MANO (network functions virtualization management and orchestration) framework • It is an architectural framework for managing and orchestrating network services (NS) & VNF packages • Composed of 3 main components: • Orchestrator (NFVO) For on-boarding and life cycle mgmt of Network services • Virtual Network Function Manager (VNFM) For life cycle, fault & perf mgmt of VNFs • Virtual Infrastructure Manager (VIM) For managing NFV Infrastructure & collection of performance measurements & events https://www.etsi.org/technologies/nfv
Traditional Linux (OS-Level) Packet Processing • NIC uses DMA to copy data into kernel buffer • Interrupt when packets arrive & copy packet data from Kernel space to User space • Use System call to transmit packet from User space • Disadvantage:When millions of packets arrive at the NIC per second, user program will get interrupted million times per second Arrival of Packets DMA Packet Copy NIC Kernel Space User Space Application
VirtualizationOverheads in Cloud Platforms • Network overhead (packet delivery) is one of the most critical concerns on Virtualization technologies • Packets arrived at NIC are copied into Host OS & Hypervisor • A virtual switch then performs switching to determine which VM is the recipient of the packet and notifies the appropriate virtual NIC • The memory page containing the packet is then either copied or granted to the Guest OS, and finally the data is copied to the user space application. • This process involving many memory copies causes significant overhead and therefore prevents VNFs achieving wire-speed throughputs
User Space Packet Processing DMA to user space buffers • Instead of interrupts, polling is used to find when the packet arrives • To send packet data, instead of system calls, regular function calls can be used • Examples: netmap, PF_RING, DPDK Arrival of Packets NIC Kernel Space User Space
Data Plane Development Kit (DPDK) • A framework for fast packet processing in data plane applications on multicore Intel CPUs • Poll mode driver reads packets from NIC • Kernel bypass functionality is used to copy the packets directly into user space memory • Packets are stored in Huge pages to minimize TLB misses • There are many NFV platforms built on top of DPDK like OpenNetVM, ClickOS, BESS, etc. • https://blog.selectel.com/introduction-dpdk-architecture-principles/
OpenNetVM: A High Performance NFV Platform • NF Manager (with DPDK) runs in User space • NF manager polls for packets, manages different NFs, reserves huge pages for packets • Shared Memory: Shared memory is reserved in the huge page for faster packet processing • NFlib API is used for communication between NF and NF Manager • NFs runs inside Docker containers • Each NF has its own ring to rx/tx a packet descriptor • Zero-copy data tx to and between NFs • No Interrupts using DPDK poll-mode driver • Scalable Multiple Rx and Tx threads in manager • NUMA-aware processing • NFs start in 0.5 sec; throughput of 68 Gbps with 6 cores; base forwarding latency < 10 micro secs.
OpenNetVM: SystemArchitecture http://sdnfv.github.io/onvm/
OpenNetVM: How does it eliminate/hide Overheads? http://sdnfv.github.io/onvm/
ONVM-5G: A Framework for Realization of 5G Core in a Box Using OpenNetVM
5G: Service Based Architecture HTTP 2.0 based message Bus DN Service-based representation using Service based interfaces (SBIs) for interaction in CP of 5G Core • Client-Server Based Architecture • Each NF is registered to a Central Repository Function (NRF) • Stateless, Cacheable, Layered system Communication
Implementation of 5G Core NFs • We have implemented control plane NFs i.e., AMF, AuSF, UDM and SMF, user plane NF UPF using openNetVM. These NFs perform: • UE Registration • Create Session and Modify Session • Uplink/Downlink data transfer • UE de-registration • All the NFs are given predetermined serviceIDs, communication between NFs within a physical server is through these serviceIDs. • An NF knows the serviceIDs of other NFs with whom it communicates. • Each packet exchanged will have a type field. Based on the type field value, next destination and next action will be decided. • We have considered two deployment scenarios here • Centralized CP and UP on a single server • Distributed CP and UP on two different servers
Centralized CP and UP Implementation of 5G Core • All 5GC NFs runs on the same server get the advantage of zero copy data tx which reduces latency • More suitable for URLLC slices, as it provides minimal latency • For generating the packets at RAN, pktgen is used
Distributed CP and UP Implementation of 5G Core • CP and UP NFs runs on the different physical servers • More suitable for eMBBslices
Experimental Setup • Server specs:Intel Xeon Gold 6126 (2.60GHz, 48 cores) with two 1G NICs running Ubuntu 16.04 • In ONVM-5G, each NF runs inside a docker container and is pinned to dedicated CPU core • Each docker container is run in a privileged mode to ensure that it has access to shared memory and the NIC • CP traffic: UE registration traffic, create session traffic, and de-registration traffic are generated by a RAN simulator • We have measured time taken to perform each of the above mentioned activities by varying number of users from 10 to 100
Experimental Results UE Registration Create Session UE de-registration • Latency incurred is drastically reduced in ONVM-5G • Packets are directly copied into the user space and all the NFs are running in the user space(Overhead of interrupts, context switching are not involved) • Since all the NFs are running in the same server, advantage of zero byte packet copying plays a major role in reducing the latency further
Offering Performance Guarantees is Challenging • To save CAPEX/OPEX and reduce the communication latency, VNFs are consolidated on a limited number of compute/server nodes • VNF performance can be affected by other co-located VNFs on the server! • VNF performance interference causes throughput degradation which ranges from 12% to 50% as more VNFs are consolidated on the same server • Many possible reasons of performance degradation • Contention for shared resources like Cache and Memory Bandwidth • OS scheduling methods, I/O bottlenecks, ….. Source: Intel
Noisy Neighbour Problem • Placing NFs on a dedicated core does not simply provide performance isolation • High utilization of shared resources by a few NFs in the node will result in high contention which will degrade the overall performance • A single LLC cache miss will add a latency of 100 ns • Tools like CAT (Cache Allocation Techniques) can be used to solve this Source: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/intel-rdt-infrastructure-paper.pdf
Impact of co-located VNF (Motivational Results) • Significant degradation in response time for Snort when it runs along with other VNFs on same node • Mitigating interference (from noisy neighbours) is of paramount importance for running critical network functions on NFV platforms and meeting SLAs Significant degradation in response time for Snort when it is runningalong with other VNFs Venkatarami Reddy, Gaurav Garg, Bheemarjuna Reddy Tamma,and Antony Franklin, "Interference Aware Network Function Selection Algorithm for Next Generation Networks", in Proc. of 3rd Workshop on Performance Issues in Virtualized Environments and Software Defined Networking (PVE-SDN 2019, co-located with IEEE NetSoft), France, June 2019.
Solutions to mitigate VNF interference • Intel Resource Director technology (RDT) to apply at platform level • Monitoring: Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM), and more. • Passively monitor resources usage to identify QoS and performance bottlenecks • Allocation: Cache Allocation Technology (CAT), Code and Data Prioritization (CDP), Memory Bandwidth Allocation (MBA), and more. • Carefully allocate resources to achieve better QoS and ensure performance guarantees • Interference aware VNF placement in Cloud platforms • Interference aware VNF Selection for steering SFC requests
Intel Resource Director Technology (RDT) Source: https://www.intel.in/content/www/in/en/architecture-and-technology/resource-director-technology.html
Need of Smart VNF Placement within Cloud Platform • Fast processing NFV platforms like OpenNetVM run the NFs in dedicated CPU cores by treating all CPU cores as equal • Most of NF placement strategies in the literature are focused on finding the right server for placement, ignoring placement of NF within the server for SFC • Randomly selecting a CPU core for running the NF will lead to a drop in the throughput of Service Function Chain because of following reasons, • Cross node memory access for SFC • Intra-node resource contention (LCC and memory bandwidth)
Research Directions • Maintaining SLAs while consolidating a variety of network-centric workloads on Cloud platforms • Scheduling NFs (microservices) running on same core (balancing fairness while meeting QoS) • Smart placement and selection of VNFs for SFC provisioning on Cloud platforms • VNF Migrations & fault management for ensuring reliability of NFs deployed on Cloud platforms
References [1]Zhang Q et al. Adaptive Interference-Aware VNF Placement for Service-Customized 5G Network Slices. In Proc. of IEEE Conference on INFOCOM, 2019. [2] C Zhang et al. L4-l7 service function chaining solution architecture. Open Networking Foundation, ONF TS-027, 2015. [3] Chaobing Zeng et al. Demystifying the performance interference of co-located virtual network functions. In Proc. of IEEE Conference on INFOCOM, pages 765–773, 2018. [4] Shundan Jiao et al. Joint virtual network function selection and traffic steering in telecom networks. In Proc. of IEEE Conference on Global Communications (GlOBECOM), pages 1–7, 2017. [5] Abdelhamid Alleg et al. Delay-aware vnf placement and chaining based on a flexible resource allocation approach. In Proc. of IEEE Conference on Network and Service Management (CNSM), pages 1–7, 2017. [6] A. Tootoonchian et al. Resq: Enabling slos in network function virtualization. In 15th USENIX Symposium on NSDI, 2018.
Email:tbr@iith.ac.in Homepage:http://www.iith.ac.in/~tbr Google Scholar Profile: http://goo.gl/JdgRB NeWSLab: https://newslab.iith.ac.in/
Shared Resource Contention: Last Level Cache (LLC) • LLC is shared to make best use of the resources in the platform • However certain types of applications can cause noise by consuming more and slow down others • Applications that are streaming in nature can cause excessive LLC evictions • Sometimes sharing is bad...Noisy Neighbor! Source: https://www.intel.in/content/www/in/en/architecture-and-technology/resource-director-technology.html
An Example of VNF Selection for Steering SFC Request • The VNF selection can be done through different algorithms that aim to optimize either resource utilization or QoS • Multiple instances of the same VNF are deployed at different server nodes to achieve reliability and to load balance the traffic from different locations • Maximizing traffic throughput with SLA guarantees when steering SFC requests is an open problem An example of Steering SFC Request