360 likes | 492 Views
Exact pattern matching on resource-limited network devices. Chien-Chung Su 2002/12/10. Outline. Problem definition Resource-limited network devices Introduction of SEBMH Disadvantages of SEBMH Adaptive bucket management Conclusion. Problem definition. Given P : pattern(s) T : text
E N D
Exact pattern matching on resource-limited network devices Chien-Chung Su 2002/12/10
Outline • Problem definition • Resource-limited network devices • Introduction of SEBMH • Disadvantages of SEBMH • Adaptive bucket management • Conclusion
Problem definition • Given • P : pattern(s) • T : text • General action • Find all occurrences of P in T
Research for exact pattern matching • The exact matching problem is solved for those typical word-processing applications. • The story changes radically for other specific applications. • DNA and protein search • Relation between search performance and database size • Network intrusion detection
Resource-limited network devices • Special issues • Security issues • Check whether P occur in T • Resource-limited • Try to break the tradeoff between speed and space • Characteristics • Network-related pattern matching • Patterns change sometimes • Texts change usually • Solutions • Dynamic hash function • Adaptive bucket management
Hash-Link-ListStructure of non-ASCIIPatterns Global Shift Table Input Mask Hash-Link-ListStructure of ASCIIPatterns SEBMH
Disadvantages of SEBMH • Because the hash function is static, the performance is still dependent with pattern set. • Dynamic hash function • The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns • Classifying the patterns to ease the influence
How to improvement • Pattern classifier • Approximate perfect hash function • Adaptive bucket management
Approximate hash function (1) • Step1. sort the class target patterns by KEY • Step2. equally distribute the class target patterns into each bucket n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; } • Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) }
Adaptive bucket management • Assumption • Resource is limited • Total bucket number is fixed • Step 1 : classify the patterns • For example (feature is a factor) • Class A • Class B • Class C
Adaptive bucket management • Step 2 : allocate buckets • For example • Traffic distribution • Class A : 50% • Class B : 30% • Class C : 20% • Policy • SEBMH(Class A) could get more buckets at this time • Set-Exclusive table will be more effective • bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑ • bucket ↓, set-exclusive utilization ↑
How to allocate buckets • Communism • Fair • Greedy
Basic assumption • Assumption • Φ : matching time for one pattern • B : total buckets number • P : total patterns number • C : classes number • Bi : buckets number for class i • Pi : patterns number for class i • Di : traffic distribution of class I • Known • P1 + P2 + … + Pc = P • D1 + D2 + … + Dc = 1 • Problem • Find a sequence (B1, B2, …, Bc) • B1 + B2 + … + Bc = B • is small enough
Communism MethodABM is not applied • Without ABM • Classifier is no need • Average matching time : • Other overheads • Overheads of approximate perfect hashing • Efficacy of Global-Shift table is not obvious • Efficacy of Set-Exclusive table is not obvious
Fair MethodAt least one solution • For example • Traffic distribution • Class A : 50% • Class B : 30% • Class C : 20% • With ABM in Fair Method • Average matching time : • Example:
Greedy MethodWe can find better solutions • For example • Traffic distribution Pattern distribution • Class A : 50% Class A : 5 • Class B : 30% Class B : 5 • Class C : 20% Class C : 20 • With ABM in Greedy Method • Average matching time : • Example
Objective • 觀察最佳解的分佈情況 • 希望能從觀察中找出演算法來求解
Traffic dist. 和 pattern dist.成正比 Bucket = 30 Bucket = 10
Traffic dist. 和 pattern dist.成反比 Bucket = 30 Bucket = 10
結論 • 當pattern和traffic的分布成反比時才有效果, 可作為訓練classifier的參考依據
Greedy Algorithm (temp) • Step 1 : get the Bi from fair method • Step 2 : borrow 1 bucket from each class • bonus_bucket = # of class • Step 3 : dispatch the bonus buckets • Bonusi = floor (bonus_bucket * (Pi / P)) • Step 4 : dispatch the remainder buckets • Add bucket into each class and find the best solution one by one
How to classify patterns (1) • The goals the classifier should achieve • High priority • reduce the frequency of ABM performed • Low priority • enhance the efficacy of ABM
How to classify patterns (2) • reduce the frequency of ABM performed • When ABM should not be performed for specific classes • …….(1) • …….(2)
How to classify patterns (3) • Expected affect of and • ↑ • ↓ • ↑ • ↓
How to classify patterns (4) • enhance the efficacy of ABM • Try to let • Pi is increasing • Di is decreasing
How to classify patterns (5) patterns • Operators • Combination • Directly combine two classes in the same domain • Sibling aggregation • Combine two classes in the different domain TCP UDP Other …. …. HTTP TFTP ICMP FTP • Objective • Make the tree with the stable traffic tree • Constrain • A lots of patterns with the same prefix in the same class should be a independent class
How to classify patterns (6) • Mathematical model for training classifier • Merge two classes when • Conditions of means hold • Conditions of variances hold • are the same as previous meanings • k (>=1) is a coefficient that could balance • Resource [ k↑] • Performance [ k ↓]
How to classify patterns (7) • Conditions of means
How to classify patterns (7) • Conditions of variances
Classifier • Advantages • reduce the impact of complex approximate perfect hash function • eliminate the pattern matching not required
Input packet NO bypass belong to any class? YES dispatch the input packet to the corresponding handler Classifier behavior