290 likes | 377 Views
Intrusion Detection Processor with Packet Content Matching. JC Ho ECE 594. Topics. Background Algorithm and Data Structure Memory Architecture Processor Design. Background. String Matching Algorithms. Boyer-Moore Good for single-pattern Wu-Manber Best average-case performance
E N D
Intrusion Detection Processor with Packet Content Matching JC Ho ECE 594
Topics • Background • Algorithm and Data Structure • Memory Architecture • Processor Design
String Matching Algorithms • Boyer-Moore • Good for single-pattern • Wu-Manber • Best average-case performance • Aho-Corasick • O(n) worst-case performance
Data Structure for Aho-Corasick • Unoptimized • 1028 bytes per node, 53MB • Bitmap Compression • 41 Bytes per node, 2.8MB • Path Compression • 20 Bytes per node (average), 1.1MB • Data structure size is reduced w/out rules
adaptation • Aho-Corasick with Bitmap Compression • Separation of signature and rules database in different storage units • Smaller next node, failure, and rules pointers • 24 bits each • Result • 41 bytes per node • Same performance 32 byte bitmap next node pointer failure pointer rules pointer
Complete signature Partial signature Partial signature Considerations • Complete or partial match No match
Considerations—Cont. • Case 1: • Failure pointers eventually go to the root • Tag as safe No match
Complete signature Considerations—Cont. • Case 2: • Easy to handle • Start from the beginning of packet • Failure pointers goes back to the root • Mark root node visited • Beginning of signature eventually goes to the right path • Traverse entire path and tag as full match
Partial signature Considerations—Cont. • Case 3: • Similar to Case 2 • Beginning of the signature eventually goes down the right path • Mark root node visited • When end of packet reached, tag as partial match
Partial signature Considerations—Cont. • Case 2: • Very different from cases 2 and 3 • Needs to start from the middle of the data structure • Needs to find the first instance of the first byte in the data structure • Traverse the path of the signature to reach the leaf, mark as partial match since root is not visited
Considerations—Cont. • Result • Case 4 can be the general case • Cases 1, 2, and 3 are special situations of case 4 • Start from the middle of the data structure every time for each packet • Cases 1, 2, and 3 will eventually be redirected back to the root and will operate as if they started from the root
Memory Architecture • Guarantee worst-case performance • On-chip storage for data structure • Similar to cache design • Wide word reference • For ASIC design, memory reference can use node addressable scheme to reduce pointer size further
0 40 63 Address 23:6 Memory Architecture—Cont. • Node size = Line width • 64 bytes in theory • 41 bytes in reality • Remaining bytes are not constructed
Preprocessing Load Data Effective Memory Address Resolution Address Check Signature Storage Unit Access Bitmap Processing Next Node Address Calculation Data Check Match Check Next Round Preparation Post-processing Processor Design
Processor Design—Preprocessing • Multiple packets are buffered • Contents are loaded to queues on-chip • Each byte of the content is accessed sequentially • Head and tail pointers required for enqueue and dequeue • Start and end pointers required to indicate start and end of packet
Processor Design—Preprocessing Cont. • Packets are assumed to be independent • Data from the same packet always occupies the same queue • Number of queues are proportional to number of stages in data path • Size of queues can be inversely proportional to number of queues
Processor Design—Core • Load data • A counter determines from which queue data is loaded • 1 byte is loaded from a different queue each cycle • No data dependency in the data path • Counter value is passed to the pipeline register along with data byte to keep track of queue
Processor Design—Core Cont. • Effective memory address resolution • Check start pointer to determine whether this is the starting byte of a packet • Starting byte of a packet • Use byte to index into a table to find the address of the first instance of this byte in data structure • Reset all flags associated with this queue • Not the starting byte • Use the next node address computed from previous byte
Processor Design—Core Cont. • Address check • Determine if effective address is root node • Set root flag (RF)
Processor Design—Core Cont. • Signature storage unit access • Bitmap loaded into 8 bitmap registers (BMR0-7), each 32 bits • Next node pointer loaded to next node register (NNR), 24 bits • Failure pointer loaded to failure register (FR) • Rules pointer loaded to rules register (RR)
Processor Design—Core Cont. • Bitmap processing • 8 independent popcount units to count the 1’s in BMR0-7 • Bits 0-4 of current data byte is used to load a bit from each BMR • Bits 5-7 of current data byte is used to select the proper bit and load value of this BMR to PCR • Check if bit is 1 and set BMF (flag) value
Processor Design—Core Cont. • Next node address calculation • If (BMF = 0) next node address = FR • If (BMF = 1) • Perform popcount on PCR to the proper bit (based on bits 0-5 of current byte) • Sum all popcount values up to proper bit • Next node address = (this sum * node size ) + NNR • Use saturated add • Value is stored back to NNR
Processor Design—Core Cont. • Data check • Check end pointer to determine if current byte is end of packet • Set end flag (EF) • Check NNR value to determine if leaf node is reached • Set match flag (MF)
Processor Design—Core Cont. • Match check • Case 2: if (RF = 1) and (MF = 1) • Set complete match flag (CMF) • Case 4: If (RF = 0) and (MF = 1) • Set partial match flag (PMF) • Case 3: If (EF = 1) and (current node != root node) and (NNR != FR) • Set PMF
Processor Design—Core Cont. • Next round preparation • Route NNR value back to load data stage • If (CMF = 1) • Set flush flag (FF) to signal to preprocessing unit to load new packet to this queue • If ignore flag (IF) is set • Ignore processing result • Reset CMF, PMF, EF
Processor Design—Post-processing • If (CMF = 1) or (PMF = 1) • Use RR value to access rules database • Perform actions according to rule • If (EF = 1) and (CMF = 0) and (PMF = 0) • Release packet to router • If (FF = 1) • Set IF to invalidate subsequent data from this queue • Reset FF
Preliminary Results • 2MB signature storage unit • 3.6 ns access time using CACTI • Assume storage unit access is critical path • Translate to 250 MHz conservatively • Support up to 2Gbps
Conclusion • Algorithm is optimized for hardware implementation • Memory requirements can be met by current technology • Implementation is feasible