1 / 36

USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTION

USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTION. Hong Han, Xlan-liang Lu, Li-Yong Ren. Procedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002. Outline. Introduction SigSniffer—System architecture

trynt
Download Presentation

USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTION Hong Han, Xlan-liang Lu, Li-Yong Ren Procedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002 Speaker: Li-Chin Huang

  2. Outline • Introduction • SigSniffer—System architecture • Signature Mining • Conclusion • Comment Speaker: Li-Chin Huang

  3. Introduction –Intrusion Detection System(IDS) An intrusion any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource. --1990 R. Heady, G. Luger … Speaker: Li-Chin Huang

  4. Introduction –Intrusion Detection System(IDS) Two kinds of analysis approaches in IDS: • Misuse detection • To use patterns of well-known attacks or weak spots of the system to identify intrusions • unable to detect any future intrusions • Anomaly detection • Establish normal usage patterns using statistical measures on system features ex: CPU and I/O • Experience is relied upon in selecting the system Speaker: Li-Chin Huang

  5. Introduction-- example A rule of Snort: alert tcp any 110 -> $HOME_NET any (msg: “LURHQ-01-Virus-Possible Incoming QAZ Worm”; content:”|71 61 7a 77 73 78 2e 68 73 71|”;) A packet incoming from port 110 of TCP Contains a string (signature): 7161 7a 77 73 78 2e 68 73 71 The worm QAZ attempts to penetrate. Speaker: Li-Chin Huang

  6. The structure of SigSniffer • SigSniffer has five parts: • Packet Sensor: capture packets for Signature Miner • Signature Miner: find the candidate signatures from packets • Signature Set: show those candidate signatures to analysts for • further analysis • Associated Signature Miner: continue to mine these candidate • signatures and find the associations of them • Rule Set: The associations of signature are candidate rules that will • be send to Rule Set. Speaker: Li-Chin Huang

  7. The structure of SigSniffer Speaker: Li-Chin Huang

  8. Experiment • Get two sets of training data • 1. Abnormal packets • One contains out-coming packets of the attack tool • 2. Normal packets • Every signature generated from Signature Apriori • The sample records will be classified into two classes: attack or non-attack. • using ID3 to classify the possible signatures Speaker: Li-Chin Huang

  9. Experiments and results Speaker: Li-Chin Huang

  10. The results of detection by Snort Speaker: Li-Chin Huang

  11. Conclusion • to present an algorithm Signature Apriori (SA) • using the signatures of packets content Speaker: Li-Chin Huang

  12. Comment • The bottleneck of Apriori: candidate generation • Huge candidate sets: • 104 frequent 1-itemset will generate 107 candidate 2-itemsets • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates. • Multiple scans of database: • Needs (n +1 ) scans, n is the length of the longest pattern • FP tree, DHP, and Invert Hashing and Pruning(IHP) Speaker: Li-Chin Huang

  13. Signature Apriori The main difference between Apriori and Signature Apriori 1. Apriori -- transaction is a composite of action. ex: (PasswordFails,ExecutionDenied) 2. Signature Apriori – transaction is a permutation of action. ex:(PasswordFails,ExecutionDenied), (ExecutionDenied, PasswordFails) Speaker: Li-Chin Huang

  14. Packet 1 Packet 2 Packet 3 Packet n … Step 1: find a set M(1, sup) Step 2: Ci = NewCandidateSig Step 3: find all M(i, sup) i = 1, 2, … length (1) M(1, sup) compute M(2, sup) Ex:{a, b, c} {ab, ba, ac, ca, bc, cb} (2) M(i, sup) i > 1 Ex: M(3, 0.8) = {‘hel’, ‘elw’, ‘mhe’, ‘ooo’, ‘ddd’} hel elw mhe ooo ddd {helw, mhel} Speaker: Li-Chin Huang

  15. Algorithm Signature Apriori Speaker: Li-Chin Huang

  16. Speaker: Li-Chin Huang

  17. Speaker: Li-Chin Huang

  18. 入侵偵測系統(IDS) • 目的:有效偵測來自內外部網路對主機的入侵行為 • 分類: • 入侵偵測系統一般分為主機型(Host)與網路型(Network)兩種。 • 主機型IDS, 可直接與主機伺服器上的作業系統與應用程式做密切的整合,因此可偵測出許多網路型IDS所查覺不出的攻擊模式(如網頁置換、作業系統的kernal竄改)。 • 網路型IDS會針對網路上的連線狀態及傳輸封包的內容進行監控。 Speaker: Li-Chin Huang

  19. Introduction –Anomaly detection • difficulties • intuition and experience is relied upon • unable to detect any future intrusions Speaker: Li-Chin Huang

  20. Sample of training data with signatures as attribute Table 1. Sample of training data with signatures as attribute. Speaker: Li-Chin Huang

  21. The Apriori Algorithm — Example Database D ActionSet ActionSet L1 Action C1 SID Scan D ActionSet C2 C2 ActionSet L2 Scan D ActionSet L3 C3 ActionSet ActionSet Scan D

  22. Mining Association Rule — Example Min. support 50%Min. confidence 50% For rule A1 => A3: support = support({A1 A3}) = Database D Action SID

  23. ID3 (example) Signat Ure1? yes no Signat Ure2? Signat Ure3? no yes yes no no yes no yes Is_attack

  24. Training Data set(ID3 example) PasswordFails SessionCPU SessionOutput ProgramResourceExhaustion Is_attack This follows an example from Quinlan’s ID3

  25. Information Gain (ID3) • Assume there are two classes, P and N • Let the set of examples S contain p elements of class P and n elements of class N • The amount of information, needed to decide if an arbitrary example in S belogns to P or N is defined as p n Class P(is_attack = yes) Class N(is_attack = yes) Set of examples S

  26. Information Gain (ID3) ProgramResourceExhaustion PasswordFails SessionCPU SessionOutput S3 S2 S1 S4 p p1 p2 p3 p4 ClassP (is_attack = yes) n1 n2 n3 n4 n Class N(is_attack = no) Set of examples S

  27. Information Gain (ID3) -- Example SessionOutput PasswordFails SessionCPU ProgramResourceExhaustion ProgramResourceExhaustion Is_attack PasswordFails SessionCPU SessionOutput S3 S2 S1 S4 p p1 p2 p3 p4 ClassP (is_attack = yes) n1 n2 n3 n4 n Class N(is_attack = no) Set of examples S

  28. Information Gain (ID3) -- Example Hence =0.940-0.693=0.247 Similarly, E(PasswordFails) • Class positive: Is_attack = “yes” • Class negative: Is_attack = “no” • Compute the entropy for PasswordFails: Gain(PasswordFails) = I(p, n) – E(passwordFails) I(p, n) = I(9, 5) = (-9/14*LOG2(9/14) +(-5/14*LOG2(5/14)= 0.940 Gain(SessionCPU) = 0.029 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048

  29. Output: A Decision Tree for “is_attack” Gain(PasswordFails) = 0.247 Gain(SessionCPU) = 0.029 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048 PasswordFails? <=30 overcast >40 31..40

  30. Output: A Decision Tree for “is_attack” Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048Gain(SessionCPU) = 0.029 PasswordFails? <=30 31..40 >40 SessionOutput? SessionOutput? SessionOutput? no yes yes no yes

  31. Output: A Decision Tree for “is_attack” Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048Gain(SessionCPU) = 0.029 PasswordFails? <=30 31..40 >40 SessionOutput? SessionOutput? SessionOutput? no yes yes no yes ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? no yes no yes no yes yes

  32. Output: A Decision Tree for “is_attack” Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048Gain(SessionCPU) = 0.029 PasswordFails? <=30 31..40 >40 SessionOutput? SessionOutput? SessionOutput? no yes yes no yes ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? no yes no yes yes no yes SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU high medium medium medium low low medium medium high low medium low Is_attack= no Is_attack= no Is_attack= yes Is_attack= yes Is_attack= no Is_attack= yes Is_attack= no Is_attack = yes Is_attack = yes

  33. PasswordFails? <=30 31..40 >40 SessionOutput? SessionOutput? SessionOutput? no yes yes no yes ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? ProgramResourceExhaustion? no yes no yes yes no yes SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU SessionCPU Is_attack= no high medium medium medium low low medium medium high low medium low Is_attack= no Is_attack= yes Is_attack= yes Is_attack= no Is_attack= yes Is_attack= no Is_attack = yes Is_attack = yes Extracting Classification Rule from Trees • IfPasswordFails = “<=30”andSessionOupt = “no”and • ProgramResourceExhaustion =“no”andSessionCPU = “high” • thenIs_attack = “no”

  34. FP-Tree (Frequent-Pattern growth) Step 1: Compute L by the order of diescending support count L = [I2:7, I1:6, I3:6, I4:2, I5:2] Step 2: reorder List_of_Action in L order Step 3: construct FP-Tree reorder null{} null{} SID=100 SID=200 I2:1 I2:2 I1:1 I1:1 I4:1 I5:1 I5:1

  35. reorder FP-Tree (Frequent-Pattern growth) null{} null{} null{} SID=500 SID=400 SID=300 I2:4 I2:4 I2:3 I1:1 I1:2 I1:2 I1:1 I4:1 I4:1 I3:1 I3:1 I3:1 I5:1 I5:1 I4:1 I5:1 I3:1 I4:1 null{} I4:1 null{} null{} SID=600 SID=700 SID=800 I2:5 null{} I1:1 I2:5 I2:6 I1:2 I1:2 SID=900 I1:2 I2:7 I1:2 I1:3 I1:2 I4:1 I3:2 I3:1 I5:1 I4:1 I3:2 I4:1 I3:2 I3:2 I5:1 I3:2 I5:1 I1:4 I4:1 I4:1 I4:1 I3:2 I4:1 I5:1 I3:1 I4:1 I3:2 I5:1 I5:1

  36. FP-Tree (Frequent-Pattern growth) null{} I2:7 I1:2 I1:4 I4:1 I3:2 I3:2 I5:1 I3:2 I4:1 I5:1

More Related