A scalable multithreaded L7-filter design for multi-core servers

A scalable multithreaded L7-filter design for multi-core servers Authors:Danhua Guo、 Guangdeng Liao、Laxmi N. Bhuyan、Bin Liu、Jianxun Jason Ding Conf. :The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08) Presenter : JHAO-YAN JIAN Date : 2010/11/10

Introduction • Traditional packet classifications make thedecision based on packet header information. But manyapplications, such asP2P and HTTP, hide their applicationcharacteristics in the payload. • The original L7-filter is a sequential DPI(Deep packet Inspection) program that identifies protocol information in a given connection. • Traditional single core server is insufficient to satisfy DPI functionality. (high speed networks, such as 10 Gigabit Ethernet) • In spite of its enhanced processing power, efficient core utilization in a multi-core architecture remains a challenge.

Introduction • Network traffic in original L7-filter is captured by Netfilter, which consists of a set of hooks inside the Linux kernel that allows kernel modules to traverse the network stack. • Inside the network stack of the kernel, a series of operations are executed to establish a connection buffer based on 5-tuple connection information in the packet header. • Operations : TCP/IP packets checksum verification, TCP/IP reassembling, IP refragmentation, etc . • After such a preprocessing stage. L7-filter starts to match all the application layer data of the arriving packets in the same connection against the protocol database in a sequential fashion.

Decoupling Linux L7-filter operations • Previous research from both academia and industry have demonstrated that the performance of L7-filter is bounded by the cost of pattern matching. • Therefore, the authors have developed a decoupled model to separate the packet arrival handling and focus on optimizing the pattern matching operations at the application layer. • Toparallelize the L7-filter operations based on a user space version.

Modeling Single-Threaded L7-filter • choose libnids as a user space module. • Libnids reads tcpdump trace files and simulates kernel network stack behaviors in user space. • Libnids offers IP defragmentation, TCP stream assembly and TCP port scan detection. • The original online L7-filter is substituted by a combination of a Preprocessing Thread(P T) and a Matching Thread(M T) . • At any point of processing, a connection can only have one of the three statuses: • 1 ) MATCHED or 2) NO_MATCH • 3) NO_MATCH_YET.

Modeling Single-Threaded L7-filter 4 3 2 1 1+2 1

Parallelizing L7-filter at Connection Level • Once more MTs are created, each MT executes on a connection buffer basis. When a new packet is reassembled for a connection, randomly selecting a non-empty runqueue of a thread introduces additional cache over head by copying packets of the same connection to different cores. • In addition, it also wastes the thread resources. • we believe dispatching an independent thread to a dedicated core saves the cost of scheduling overhead and reduces cache misses introduced by live migrations of unbalanced work loads.

Parallelizing L7-filter at Connection Level 3 4 3 1 2

Parallelizing L7-filter at Connection Level

Experiment Platform • This server system has two CPU sockets, each embeds a quad-core Xeon X5355 2.66GHz processors, and 16GB of 667MHz DDR2 SDRAM. Each socket has two 4MB shared L2 caches. • To Use Linux kernel 2.6.18 as default OS.

Throughput and Core Utilization • With 7 concurrent threads, the system throughput increases by 51% compared to the naive OS scheduling. The system scales near linearly ( a speedup of 6.5X when 7 threads are applied.) to the number of MTs.

Cache Performance

A Life-of-Packet Analysis

A scalable multithreaded L7-filter design for multi-core servers

A scalable multithreaded L7-filter design for multi-core servers

Presentation Transcript

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Multi-Core Design Automation Challenges

A Bridging Model for Multi-Core Computing

MT-MPI: Multithreaded MPI for Many-Core Environments

Scalable Multi-core Model Checking Fairness Enhanced Systems

Multithreaded Clustering for Multi-level Hypergraph Partitioning

NetSlices : Scalable Multi-Core Packet Processing in User-Space

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Scalable Multimedia Servers

A Profiler for a Multi-Core Multi-FPGA System

Scalable Multi-core Sonar Beamforming with Computational Process Networks

Architecture Design of a Scalable Single-Chip Multi-Processor

Hoard: A Scalable Memory Allocator for Multithreaded Applications

A Comparison of Load Balancing Techniques for Scalable Web Servers

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Multi-core/Cell Game Engine Design

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Hoard: A Scalable Memory Allocator for Multithreaded Applications

Scalable Synchronization Algorithms in Multi-core Processors

Scalable Memory Management for Multithreaded Applications

Scalable Data Servers for Large Multivariate Volume Visualization

Filter Design