300 likes | 451 Views
Information-Theoretic Measures for Anomaly Detection. Wenke Lee, and Dong Xiang (North Carolina State University). IEEE Security and Privacy, 2001. Speaker: Chang Huan Wu 2009/4/14. Outline. Introduction Information-Theoretic Measures Case Studies Conclusions. Introduction (1/2).
E N D
Information-Theoretic Measures for Anomaly Detection Wenke Lee, and Dong Xiang (North Carolina State University) IEEE Security and Privacy, 2001 Speaker: Chang Huan Wu 2009/4/14
Outline • Introduction • Information-Theoretic Measures • Case Studies • Conclusions
Introduction (1/2) • Misuse detection • Use the “signatures” of known attacks • Anomaly detection • Use established normal profiles • The basic premise for anomaly detection:There is regularity in audit data that is consistent with the normal behavior and thus distinct from the abnormal behavior
Introduction (2/2) • Most anomaly detection models are built based solely on “expert” knowledge or intuition • Provide theoretical foundations as well as useful tools that can facilitate the IDS development process and improve the effectiveness of ID technologies
Information-Theoretic Measures (1/7) • Entropy • Use entropy as a measure of the regularity of audit data
Information-Theoretic Measures (2/7) • Conditional Entropy • Let X be a collection of sequences where each is (e1, e2, …, en-1, en), each ei is an audit event; let Y be the collection of subsequences where each is (e1, e2, …, ek), and k < n • H(X | Y) tell us how much uncertainty remains for the rest of audit events in a sequence x after we have seen y
Information-Theoretic Measures (3/7) • Relative Entropy • Relative entropy measures the distance of the regularities between two datasets • Training dataset and testing dataset
Information-Theoretic Measures (4/7) • When we use conditional entropy to measure the regularity of sequential dependencies, we can use relative conditional entropy to measure the distance between two audit datasets
Information-Theoretic Measures (5/7) • Intrusion detection can be cast as a classification problem • When constructing a classifier, a classification algorithm needs to search for features with high information gain • When the dataset is partitioned according to this feature values, the subsets will have lower entropy
Information-Theoretic Measures (6/7) • Information Gain
Gain(年齡)=0.0167 • Gain(性別)=0.0972 • Gain(家庭所得)=0.0177 Information Gain H(X)=-((4/16)*log2(4/16)+(12/16)*log2(12/16))=0.8113 E(年齡)=(6/16)*H(<35)+(10/16)*H(>35)=0.7946 Gain(年齡)=H(X)-E(年齡)=0.0167
Information-Theoretic Measures (7/7) • Intuitively, the more information we have, the better the detection performance • There is always a cost for any gain • We can define information cost as the average time for processing an audit record and checking against the detection model
UNM sendmail System Call Data (1/6) • University of New Mexico (UNM) sendmail system call data • Each trace contains the consecutive system calls made by the run-time processes • Used the first 80% traces as the training data and the last 20%as part of the testing data
UNM sendmail System Call Data (2/6) • H(length-n sequences | subsequences of the length n-1) • Measures the regularity of how the first n-1 system calls determines the n-th system call => Conditional entropy drops as sequence length increases
UNM sendmail System Call Data (3/6) • For normal data, the trend of misclassification rate coincides with the trend of conditional entropy
UNM sendmail System Call Data (4/6) • Misclassification rates for the intrusion traces are much higher • This suggests that we can use the range of the misclassification rate as the indicator of whether a given trace is normal or abnormal (intrusion)
UNM sendmail System Call Data (5/6) • When the training and testing normal datasets differs more, then the misclassification rate on testing normal data is also higher
UNM sendmail System Call Data (6/6) • The cost is a linear function of the sequence length • Length ↑, accuracy ↑ but cost also↑
MIT Lincoln Lab sendmail BSM Data (1/6) • BSM data developed and distributed by MIT Lincoln Lab for the 1999 DARPA evaluation • Each audit record corresponds to a system call made by sendmail • Contains additional information (Ex. user and group IDs, the obj name)
MIT Lincoln Lab sendmail BSM Data (2/6) • UNM data : (s1, s2, … , sl) • BSM data • so : (s1_o1, s2_o2, … , sl_ol) • s-o : (s1, o1, s2, o2, … , sl, ol) • s: system call , o: obj name (system or user or other)
MIT Lincoln Lab sendmail BSM Data (3/6) • Conditional entropy drops as sequence length increases
MIT Lincoln Lab sendmail BSM Data (4/6) • For in-bound mails the testing data have clearly higher misclassification rates than the training data
MIT Lincoln Lab sendmail BSM Data (5/6) • Out-bound mails have much smaller relative conditional entropy than in-bound mails
MIT Lincoln Lab sendmail BSM Data (6/6) • Though the performance with obj name is slightly better, if we consider cost, it is actually better to use system call name only
MIT Lincoln Lab Network Data (1/4) • tcpdump data developed and distributed by MIT Lincoln Lab for the 1998 DARPA evaluation • Each record describes a connection using the following features: timestamp, duration, source port, source host, service…
MIT Lincoln Lab Network Data (2/4) • Destination host was used for partitioning the data into per-host subsets
MIT Lincoln Lab Network Data (3/4) • We can see from the figure that intrusion datasets have much higher misclassification rates • Models from the (more) partitioned datasets have much better performance
MIT Lincoln Lab Network Data (4/4) • Conditional entropy decrease as window size grows
Conclusion • Proposed to use some information-theoretic measures for anomaly detection
Comments • Provide theoretical foundations, use numbers to tell the result • Plentiful experiment result