1 / 36

Exact pattern matching on resource-limited network devices

Exact pattern matching on resource-limited network devices. Chien-Chung Su 2002/12/10. Outline. Problem definition Resource-limited network devices Introduction of SEBMH Disadvantages of SEBMH Adaptive bucket management Conclusion. Problem definition. Given P : pattern(s) T : text

Download Presentation

Exact pattern matching on resource-limited network devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact pattern matching on resource-limited network devices Chien-Chung Su 2002/12/10

  2. Outline • Problem definition • Resource-limited network devices • Introduction of SEBMH • Disadvantages of SEBMH • Adaptive bucket management • Conclusion

  3. Problem definition • Given • P : pattern(s) • T : text • General action • Find all occurrences of P in T

  4. Research for exact pattern matching • The exact matching problem is solved for those typical word-processing applications. • The story changes radically for other specific applications. • DNA and protein search • Relation between search performance and database size • Network intrusion detection

  5. Resource-limited network devices • Special issues • Security issues • Check whether P occur in T • Resource-limited • Try to break the tradeoff between speed and space • Characteristics • Network-related pattern matching • Patterns change sometimes • Texts change usually • Solutions • Dynamic hash function • Adaptive bucket management

  6. Hash-Link-ListStructure of non-ASCIIPatterns Global Shift Table Input Mask Hash-Link-ListStructure of ASCIIPatterns SEBMH

  7. Set-Exclusive table

  8. Disadvantages of SEBMH • Because the hash function is static, the performance is still dependent with pattern set. • Dynamic hash function • The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns • Classifying the patterns to ease the influence

  9. How to improvement • Pattern classifier • Approximate perfect hash function • Adaptive bucket management

  10. Approximate hash function (1) • Step1. sort the class target patterns by KEY • Step2. equally distribute the class target patterns into each bucket n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; } • Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) }

  11. Approximate hash function (2)

  12. Adaptive bucket management • Assumption • Resource is limited • Total bucket number is fixed • Step 1 : classify the patterns • For example (feature is a factor) • Class A • Class B • Class C

  13. Adaptive bucket management • Step 2 : allocate buckets • For example • Traffic distribution • Class A : 50% • Class B : 30% • Class C : 20% • Policy • SEBMH(Class A) could get more buckets at this time • Set-Exclusive table will be more effective • bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑ • bucket ↓, set-exclusive utilization ↑

  14. How to allocate buckets • Communism • Fair • Greedy

  15. Basic assumption • Assumption • Φ : matching time for one pattern • B : total buckets number • P : total patterns number • C : classes number • Bi : buckets number for class i • Pi : patterns number for class i • Di : traffic distribution of class I • Known • P1 + P2 + … + Pc = P • D1 + D2 + … + Dc = 1 • Problem • Find a sequence (B1, B2, …, Bc) • B1 + B2 + … + Bc = B • is small enough

  16. Communism MethodABM is not applied • Without ABM • Classifier is no need • Average matching time : • Other overheads • Overheads of approximate perfect hashing • Efficacy of Global-Shift table is not obvious • Efficacy of Set-Exclusive table is not obvious

  17. Fair MethodAt least one solution • For example • Traffic distribution • Class A : 50% • Class B : 30% • Class C : 20% • With ABM in Fair Method • Average matching time : • Example:

  18. Greedy MethodWe can find better solutions • For example • Traffic distribution Pattern distribution • Class A : 50% Class A : 5 • Class B : 30% Class B : 5 • Class C : 20% Class C : 20 • With ABM in Greedy Method • Average matching time : • Example

  19. 20021112_實驗報告

  20. Objective • 觀察最佳解的分佈情況 • 希望能從觀察中找出演算法來求解

  21. Traffic dist. 和 pattern dist.成正比 Bucket = 30 Bucket = 10

  22. Traffic dist. 和 pattern dist.成反比 Bucket = 30 Bucket = 10

  23. 結論 • 當pattern和traffic的分布成反比時才有效果, 可作為訓練classifier的參考依據

  24. Greedy Algorithm (temp) • Step 1 : get the Bi from fair method • Step 2 : borrow 1 bucket from each class • bonus_bucket = # of class • Step 3 : dispatch the bonus buckets • Bonusi = floor (bonus_bucket * (Pi / P)) • Step 4 : dispatch the remainder buckets • Add bucket into each class and find the best solution one by one

  25. How to classify patterns (1) • The goals the classifier should achieve • High priority • reduce the frequency of ABM performed • Low priority • enhance the efficacy of ABM

  26. How to classify patterns (2) • reduce the frequency of ABM performed • When ABM should not be performed for specific classes • …….(1) • …….(2)

  27. How to classify patterns (3) • Expected affect of and • ↑ • ↓ • ↑ • ↓

  28. How to classify patterns (4) • enhance the efficacy of ABM • Try to let • Pi is increasing • Di is decreasing

  29. How to classify patterns (5) patterns • Operators • Combination • Directly combine two classes in the same domain • Sibling aggregation • Combine two classes in the different domain TCP UDP Other …. …. HTTP TFTP ICMP FTP • Objective • Make the tree with the stable traffic tree • Constrain • A lots of patterns with the same prefix in the same class should be a independent class

  30. How to classify patterns (6) • Mathematical model for training classifier • Merge two classes when • Conditions of means hold • Conditions of variances hold • are the same as previous meanings • k (>=1) is a coefficient that could balance • Resource [ k↑] • Performance [ k ↓]

  31. How to classify patterns (7) • Conditions of means

  32. How to classify patterns (7) • Conditions of variances

  33. Classifier • Advantages • reduce the impact of complex approximate perfect hash function • eliminate the pattern matching not required

  34. Input packet NO bypass belong to any class? YES dispatch the input packet to the corresponding handler Classifier behavior

  35. Next Experiments

  36. Conclusion

More Related