Improving the Performance of the Linux Network Subsystem

Improving the Performance of the Linux Network Subsystem King Fahd University of Petroleum and Minerals (KFUPM) INFORMATION AND COMPUTER SCIENCE DEPARTMENT Dr. K. Salah April 22, 2007 Dhahran, Saudi Arabia

Agenda • Introduction • Receive-livelock Phenomenon • Existing Schemes • Previous Work. Why Hybrid Scheme? • Problem Statement • Project Objectives • Equipment • Project Phases and Scheduling • Benefits and Utilizations • Budget • Summary

Introduction • High-Speed Network devices are widely deployed • Gigabit Ethernet Technology supports 1 Gb/s and 10 Gb/s raw bandwidth • Network performance has been shifted to servers and end hosts • The high bandwidth increase can negatively impact the OS performance due to the interrupt overhead caused by the incoming gigabit traffic. • As interrupt handling has more priority over other processing, this leads to receive-livelock phenomenon

Typical Architecture Model

Packet Arrival Rate - Slow Applications Protocol Stack Network traffic Host system

Packet Arrival Rate - Fast Applications Protocol Stack Network traffic X X Host system

Receive-livelock Phenomenon Ideal Throughput MLFRR Acceptable Livelock Offered load (Source: K. K. Ramakrishnan,1993)

Existing Schemes • Normal Interruption • Interrupt Disabling and Enabling • Polling • Pure Polling vs. NAPI Polling • Interrupt Coalescing (IC) • Hybrid Scheme

Interrupt Disabling and Enabling • The idea of pure interrupt disable-enable scheme is to have the interrupts of incoming packets turned off or disabled as long as there are packets to be processed by kernel’s protocol stack, i.e., the protocol buffer is not empty. • When the buffer is empty, the interrupts are turned on again or re-enabled. • Any incoming packets (while the interrupts are disabled) are DMA’d quietly to protocol buffer without incurring any interrupt overhead.

Polling • Disable interrupts of incoming packets altogether and thus eliminating interrupt overhead completely. • OS periodically polls its host system memory (i.e., protocol processing buffer or DMA Rx Ring) to find packets to process. • In general, exhaustive polling is rarely implemented. Polling with quota is usually the case whereby only a maximum number of packets is processed in each poll in order to leave some CPU power for application processing. • Two drawbacks for polling. • First, unsuccessful polls can be encountered as packets are not guaranteed to be present at all times in the host memory, and thus CPU power is wasted. • Second, processing of incoming packets is not performed immediately as the packets get queued until they are polled. • Selecting the polling period is crucial. • Very frequent polling can be detrimental to performance as significant overhead can be encountered at each poll. • On the other hand, if polling is performed infrequently, packets may encounter long delays.

Pure Polling vs. NAPI Polling

Shortcomings of NAPI • Rotten Packets • When NAPI re-enables interrupts, there is the possibility of a packet or more would sneak in during that time and go undetected until a fresh packet arrives. These packets are known as “Rotten packets”. • Poor Performance with CPU-bound Applications • NAPI was reported not to perform well for hosts that heavily loaded with CPU-bound applications. This is caused from scheduling polling using Linux softIRQs whereby CPU-bound user applications compete with softIRQs for CPU, and therefore softIRQs (and NAPI) would get less chance to run.

Interrupt Coalescing • Most network adapters or NICs are manufactured to have interrupt coalescing. • In IC, the NIC generates a single interrupt for a group of incoming packets. • This is opposed to normal interruption mode in which the NIC generates an interrupt for every incoming packet. • Two schemes to mitigate the rate of interrupts • Count-based IC • NIC generates an interrupt when a predefined number of packets has been received. • Time-based IC • NIC waits a predefined time period before it generates an interrupt. During this time period multiple packets can be received.

Hybrid Scheme • A combination of • Interrupt Disabling and Enabling & • Polling

Why?

Problem Statement • In this research we intend • to implement a novel hybrid interrupt-handling scheme that improves the performance of Linux networking subsystem and overcome the shortcomings of NAPI. • to prove experimentally that our proposed scheme outperforms NAPI under different system configurations and load conditions.

Project Objectives • Devise a novel scheme for Linux platform to enhance packet reception of links at Gigabit speed. • The scheme is expected to outperform in terms of latency, throughput, and CPU availability the scheme of NAPI currently implemented in the latest Linux 2.6. • The novel scheme should architect a proper solution to measure and forecast the traffic rate. • Also the novel scheme should work for a host with single and multiple interfaces. • More importantly, the scheme should work for SMP (Symmetric Multi-Processing) architecture where the host’s motherboard has multiple processors.

Project Objectives (cont’d) • Find solutions to shortcomings and open issues of NAPI (other than latency, throughput, and CPU availability). These shortcomings include rotten packets and poor network performance when the system is heavily loaded with CPU-bound applications. • Devise a novel generic benchmark for Linux hosts to measure find the switching point (cliff point).

Project Objectives (cont’d) • Develop a testbed of an experiment to examine and compare the performance of the new modified Linux version to latest Linux NAPI. • The experiment takes into account numerous and different test conditions and variables. • Linux host with single and multiple network interfaces • Different types of input traffic (bursty, constant, Poisson) • Different packet sizes • Various types of system loads including CPU-bound and I/O bound applications • Hosts with single and multiple processors (i.e. SMP). • The experiment should follow guidelines of testing and benchmarking laid out in RFC2544.

Experimental Equipment

Project Phases and Scheduling • Phase I: (Period of six months) • This is primarily a Linux network stack re-design and modification phase • Phase II: (Period of twelve months) • This phase is concerned with the testbed and experimental setup as well as running performance evaluation of NAPI and our proposed hybrid scheme. • Phase III: (Period of six months) • This phase is concerned with the performance of our hybrid scheme for hosts with SMP support.

Phase I • Devise an appropriate technique to measure in real-time the traffic arrival rate. This task includes the following subtasks: • Perform extensive review to measure and forecast the arrival traffic rate. Devise a forecast technique that has the following requirements: (1) computationally simplified and optimized with minimal overhead and operations, (2) accurate in terms of being comparable to actual data rate, (3) stable in terms of ignoring short traffic spikes, and (4) responsive in terms of following changes in actual traffic rate. • Examine the effectiveness of the proposed technique to forecast the traffic arrival rate and compare it with other proposed techniques in the literature. The technique must be appropriate for different type of traffics including bursty traffic with empirical packet sizes. Discrete Event Simulation (DES) will be used to assess the performance and effectiveness of our proposed technique. • Plot, analyze, and compare performance of proposed technique for forecasting arrival traffic rate. • Determine (using simulation and fine tuning of parameters) the minimum and maximum values (i.e., confidence interval) of forecasted/estimated traffic rate. These values will be used as the upper and lower thresholds of the cliff point and will be used by the hybrid scheme for switching between interrupt disable-enable and polling. Also they will be used to prevent frequent oscillation and switching between the scheme of interrupt disable-enable and polling, and thereby minimizing the overall overhead.

Phase I – cont’d • Understand thoroughly Linux kernel and the complex NAPI code. This would require the following subtasks: • Understand and perform extensive review and study of Linux 2.6 network stack (NAPI) and the NIC network drivers. • Set up a utility called cscope or kscope to navigate and browse the actual Linux code and understand it thoroughly. • Identify exactly what code needs to be changed in both Linux kernel as well as the network driver • Identify how different the code should be to support single processor and multi-processor host, i.e., SMP. • Investigate open known issues or shortcomings with NAPI (other than expected latency at low traffic rate) and critique proposed solutions in the literature. • These shortcomings include: rotten packets and poor network performance under heavy CPU-bound applications. • More importantly, investigate how our proposed solution of hybrid scheme will resolve these known open issues.

Phase II • Modify, test, and recompile the code of Linux 2.6 to implement our proposed hybrid scheme and the scheme to forecast the traffic arrival rate. In addition the code has to handle solutions to rotten packets and the problem of poor performance of network stack under a system heavily loaded with CPU-bound applications. • Learn how to use the IXIA 400T traffic generator/analyzer. Configure simple experiment of generating and receiving packets. • Identify the proper cliff point for the system. This can be accomplished only by determining the interrupt overhead and protocol processing time. The interrupt overhead and protocol processing time will be determined using measurement. • Using IXIA or some other technique, devise a generic and useful way to measure interrupt overhead. Determine the distribution of the interrupt overhead. • Using IXIA or some other technique, devise a way to measure protocol processing at OS level. Determine the distribution of kernel’s protocol processing.

Phase II – cont’d • Using IXIA 400T and a PC with Linux 2.6 and NAPI enabled, measure and plot the following performance metrics: • Packet forwarding latency • Packet forwarding throughput • CPU utilization with packet forwarding • The above experiment will consider the following different configurations and conditions: • Different packet sizes • Traffic distribution: Poisson vs. bursty • Traffic reception and transfer on a single NIC • Traffic reception and transfer on multiple NICs • Using IXIA 400T and a PC with our proposed hybrid scheme, do the same performance measurements as in Task 7 and Task 8. • Plot and compare performance of NAPI and our proposed hybrid scheme. Make proper conclusions. • Compare and evaluate the performance of our solutions for NAPI shortcomings of rotten packets and poor network performance under CPU-bound applications. Consider performance conditions and configurations of Task 7 and Task 8.

Phase III • Examine the performance impact described for previous tasks of (Task 6-11) under Linux support for SMP with dual processors motherboard. • Compare SMP performance to the performance when using only a single processor. This is a huge phase, as six tasks are to be carried out again. Its is to be noted according to RFC 2544 recommendations that in order to obtain a reported value for a single performance point, a test has to be repeated at least 20 times and the reported value must be the average of these 20 recorded values. Also the recommendations and guidelines state that the test has to run at least 20 minutes for obtaining one single reported value. • Ensure that the novel scheme preservers the order of packets, i.e., there is no need for packet re-ordering. • Prepare and deliver the final report

Work Plan

Personal Requirement • The project team will consist of the primary investigator and two graduate students (PhD or MS degree candidates). • The graduate students will be a computer science/engineering graduate and will work under the supervision and guidance of the PI.

Benefits and Utilization • contribute to the advancement of open-source operating systems (as that of Linux) by providing a step-up version that improves the performance of its networking subsystem to suit Gigabit network traffic. • This will lead to having better Linux-based routers, firewalls, servers, and proxies. • utilize previously theoretical work of [24] to devise a new hybrid interrupt handling scheme to improve the networking performance of Linux or any operating systems. polling, and thereby minimizing the overall overhead. • provide adequate solutions to NAPI shortcomings of the current Linux 2.6 networking subsystem.

Benefits and Utilization -- cont’d • prove and demonstrate that the proposed hybrid scheme is a big enhancement in terms of performance form current versions when considering many different configurations and load conditions. • provide an algorithm and computationally optimized technique to forecast the traffic arrival rate. Such an algorithm or technique should have no or minimal impact on Linux performance. • provide a generic methodology and benchmark to identify the switching point. • Research community at large can benefit substantially from the experimental work in terms of methodology, testbed, experimental setup and configuration. The experimental methodology and techniques can be employed for similar systems to conduct performance comparison.

Benefits and Utilization -- cont’d • major beneficiaries may include almost all Saudi companies, as well as governmental and non-governmental institutions, that show keen interest in using Linux. • GbE deployment • Linux wide popularity • will benefit KFUPM in general and the department of Information and Computer Science in particular. • It is anticipated that a modified version of Linux that best suits Gigabit traffic will carry the name of KFUPM and the ICS department on it. • KFUPM can be seen as an active contributor to open-source code and community. • results of general interest to the research community will be published at key international conference, such as these of IEEE and ACM. Also it is anticipated that this research work will lead to publications in refereed reputable journals. • No network traffic generators or analyzers at KFUPM. • Such a project can definitely lay the ground for further research and development by having such equipment available. The equipment can be utilized for research. • Also the IT center at the university can use such equipment for diagnosing and troubleshooting network problems related to performance bottlenecks.

Budget

Summary • In this research we intend to improve the performance of Linux networking subsystem and overcome the shortcomings of NAPI. • The project will be of great benefit to research and open-source community and KUFPM, and the public at large

Improving the Performance of the Linux Network Subsystem

Improving the Performance of the Linux Network Subsystem

Presentation Transcript

Improving the Performance of Your System

Improving TCP Performance over Ad-hoc Network

Improving performance with the Windows Performance Toolkit

Improving the performance of your Service Centre

Improving Performance

Improving Wireless Network Performance using Sensor Hints

Improving the analysis performance

The Impact of Ads on Performance and Improving Perceived Performance

The performance Improving of Microprocessor World

The Network Performance Advisor

“ Improving the performance of managers and organisations”

Improving Efficiency of the KURT-Linux Data Streams Performance Evaluation Framework

Improving the performance of your Service Centre

Improving the Implementation of the Performance-based Allocation System

The Journey of a Packet Through the Linux Network Stack

Improving the Performance of Network Intrusion Detection Using Graphics Processors

The weathering subsystem

The Network Performance Advisor