250 likes | 353 Views
A Resource Efficient Content Inspection System for Next Generation Smart NICs. Karthikeyan Sabhanatarajan, Ann Gordon-Ross*. The Energy Efficient Internet Project High-performance Computing & Simulation Research Lab ECE Department, University of Florida, Gainesville.
E N D
A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet Project High-performance Computing & Simulation Research Lab ECE Department, University of Florida, Gainesville This work was supported by the U.S. National Science Foundation * Also affiliated with NSF Center for High Performance Reconfigurable Computing
Introduction INTERNET • Internet has grown at an alarming rate – 305% between 2000 and 2008 2 of 25
IDLE IDLE IDLE Introduction INTERNET • Edge devices are left idle 75% of the time with power management • features disabled to maintain network connectivity. 3 of 25
z Z z IDLE INTERNET Introduction A solution to save power on the idle devices is power proxying The idle PC is allowed to sleep The PC delegates responsibility to the NIC to handle network traffic Additionally, NICs can enhance network security through Network Intrusion Detection 4 of 25
PAYLOAD HEADER Introduction Next Generation Interfaces – Also known as Smart NICs are expected to take increased network responsibility Key Requirement – Packet Inspection Packet Header Inspection Content Inspection This presentation focuses on Content Inspection. Content inspection is the process of searching the payload of the packet for the occurrence of known set of patterns called signatures. 5 of 25
Motivation Existing Methodologies Hardware Software Boyer-Moore Aho Corasick FPGAs TCAMs Bloom Filters FPGAs TCAMs Bloom Filters Boyer-Moore Aho Corasick Wu Manber Wu Manber Software techniques cannot support high speed links with large signature sets Auxiliary data structures such as SRAM are used to store pattern combinations to help determine a pattern match FPGAs – Exploits Parallelism – Prohibitive price, area, and power for wide scale deployments TCAMs – Popular Option – Performance O(1) – However, prohibitive energy,price, and auxiliary data structure requirements for existing implementations. Bloom Filters – Energy efficient and moderate throughput – False positives required further inspection on payload matching , imposes parallelism limits (scalability) 6 of 25
Background – TCAM Methodology Sample Signature: A B C D E F G H A B C D J K L M E F G A B C D E F G H J K L M E F G * When w=4: TCAM Prefix Pattern A B C D E F G H A B C D J K L M E F G * Suffix Pattern w = 4 TCAM TCAMs are attractive candidates for pattern matching due to their inherent simplicity in pattern matching , small look up time , high throughput, high density, and scalability. 7 of 25
Stores information on type of matched pattern i.e, prefix, suffix Records the index of the constructed prefix pattern Auxiliary SRAM Structures Combined Pattern Table Matched Index Stores the valid combination of all possible prefix and suffix entries Matching Table Partial Hit List Background – TCAM Methodology Proposed by Lakshman et. al A B C D E F G H J K L M E F G U I A B C D E F G H J K L M E F G U I A B C D E F G H J K L M E F G U I A B C D E F G H J K L M E F G U I A B C D E F G H J K L M E F G * w = 4 TCAM Auxiliary SRAM structures contain several pattern permutations to identify valid patterns O(N2) – Auxiliary SRAM structure space requirement. Gao et. al reduced this requirement to O(NlogN) by storing address permutations. 8 of 25
Proposed Solution TCAM Techniques are : • Simplest and fastest technique - O(1) look up. • Can match future speed limits of 10 Gbps. • Highly scalable with no parallelism limits. • Can accommodate signatures of varying length and different signature set sizes with ease However they suffer from : • Increased energy consumption • Prohibitive price • Increased auxiliary data structure requirements Making them unsuitable for wide scale deployment in SNICs 9 of 25
We propose a hybrid TCAM based solution Proposed Solution Our Technique solves • Energy efficiency – Through partitioned architecture • Additional further reduction in power consumption through caching by exploiting • network locality • Auxiliary data structure requirement reduction using bloom filter or software techniques • Meets throughput requirements of high speed links such as 1 Gbps/ 10 Gbps with ease • More suitable for wide scale deployment due to high energy efficiency and reduced • memory requirements. 10 of 25
This permutation is then stored in bloom filter or in software Hybrid TCAM Methodology PTCAM STCAM E F G H A B C D J K L M E F G * A B C D PTCAM A B C D E F G H J K L M E F G * STCAM w = 4 w = 4 w = 4 TCAM Partition the single TCAM into a prefix TCAM (PTCAM) and a suffix TCAM (STCAM) Store signatures in the STCAM and PTCAM accordingly. The signature is then expressed as permutation of STCAM and PTCAM address. A B C D E F G H A B C D J K L M E F G P0S0 S1 S2 S3 11 of 25
Exploiting Signature Locality Our experimentation indicates that there exists sufficient locality in network traces. To reduce unwanted switching we exploit this property and introduce a cache between the PTCAMand STCAM 12 of 25
Hybrid TCAM Methodology PTCAM PTCAM STCAM STCAM Suffix Cache $ Ctrl E F G H A B C D J K L M E F G * E F G H A B C D J K L M E F G * A B C D A B C D w = 4 w = 4 w = 4 w = 4 13 of 25
Left Shift A B C D E F G H J K L M E F G U I Enable Hybrid TCAM Methodology PTCAM STCAM Suffix Cache $ Ctrl E F G H A B C D J K L M E F G * A B C D w = 4 Hit Hit Enable Hit Miss w = 4 Activator Right Shift Enable Buffer 1 0 0 0 Enabler Pause 0th..(w-1)th Payload is fed to the inspection system, shifted at the rate of 1 byte/clock The cache is activated (w-1) clock cycles after a TCAM hit A cache miss pauses shifting to allow searching the suffix TCAM for the pattern Cache controller ($ ctrl) updates suffix cache 14 of 25
Enable Left Shift S1 S1 P1 … 01 11 00 01 00 … ……… P1 S1 To Bloom Filter or Software unit to verify the combination Hybrid TCAM Methodology Left Shift A B C D E F G H J K L M E F G U I PTCAM STCAM Suffix Cache $ Ctrl E F G H A B C D J K L M E F G * A B C D w = 4 Hit Hit Enable Hit Miss w = 4 Activator Right Shift Enable Buffer 1 0 0 0 Enabler Pause 0th..(w-1)th 15 of 25
Match Addr Enable Match Addr Left Shift P1 S1 S1 … 00 00 01 11 01 … Hybrid TCAM Methodology Left Shift A B C D E F G H J K L M E F G U I PTCAM STCAM Suffix Cache $ Ctrl E F G H A B C D J K L M E F G * A B C D w = 4 Hit Hit Enable Hit Miss w = 4 Activator Right Shift Hit Enable Buffer 1 0 0 0 Enabler Pause 0th..(w-1)th Contention Resolution A contention resolution unit handles contention between identical PTCAM and STCAM patterns. Preference is given to PTCAM match over STCAM match 16 of 25
Experimental Setup Packet traces – Malicious traces from MIT – LL and capture the flag contest from DEFCON Festival No available power proxying traces and is an ongoing research C-based custom simulator written to behaviorally simulate the entire system. SNORT and ClamAV used as signature sets Packets are reassembled and fed to the simulator STCAM accesses saved to analyze the effect of caching TCAM energy consumption obtained from Agarwal et. al TCAM modelling tool 17 of 25
Results – Signature Distribution ClamAV and SNORT rule sets : SNORT smaller patterns (70% <= 4 bytes ClamAV medium sized patterns (72% <30 bytes & >100 bytes) 18 of 25
Results Effect of partitioning on Size Partitioning circumvents natural TCAM compression . However, negligible increase in TCAM size. 19 of 25
Results EDP Reduction Partitioning reduces Energy-Delay Product (EDP) . Two smaller TCAMs are faster than One single big TCAM. Higher EDP savings for widths of 8 and 16 bytes. 20 of 25
Results Energy Savings • Energy reduction for a partitioned system compared to a non-partitioned system verses TCAM width for real-time traffic traces. • Energy savings range from 6% to 69% (SNORT) and 6% to 87% (ClamAV) • Smaller TCAMs widths give greater energy savings. • Larger TCAM accesses use more “don’t care” bits. 21 of 25
Results Effect of Caching – Hit rate • Caching on STCAM width of 4 bytes analyzed. • Hit rates range from 28% to 88% for cache sizes of only 40 to 60 entries • A cache containing 40 to 60 entries represents only 0.002% to 0.004%, • respectively, of the S_TCAM entries 22 of 25
Results Effect of Caching – Energy Savings Energy savings for a partitioned TCAM system (w=4) with a suffix cache compared to a partitioned TCAM system with no suffix cache for varying number of cache entries. 13% to 64% additional Savings 23 of 25
Conclusion • Developed an energy efficient partitioned TCAM-based content inspection system for SNICs. • Energy and throughput aware • Energy Delay Product improvements of up to 62% compared to previous non-partitioned TCAM systems. • Up to 87% energy savings (average) compared to a non-partitioned TCAM system. • A simple cache with a random replacement policy further reduces the energy consumption by 64% compared to a partitioned TCAM system. • Caching incurs a throughput reduction of 5.5%. 24 of 25
Future Work • Evaluating proposed bloom filter based architecture • Improved caching techniques • Attack robustness to counter maliciously engineered packets • A pipelined architecture to hide cache misses and improve throughput. 25 of 25