270 likes | 383 Views
Ensemble-based Adaptive Intrusion Detection Wei Fan IBM T.J.Watson Research Salvatore J. Stolfo Columbia University. Data Mining for Intrusion Detection. Connection Records. (telnet, 10,3,...). Feature Construction. (ftp,10,20,...). Training Data. Label Existing Connections.
E N D
Ensemble-based Adaptive Intrusion Detection Wei Fan IBM T.J.Watson Research Salvatore J. Stolfo Columbia University
Data Mining for Intrusion Detection Connection Records (telnet, 10,3,...) Feature Construction (ftp,10,20,...) Training Data Label Existing Connections Intrusion Detection Model Inductive Learner
Some interesting requirements ... ... • New types of intrusions are constantly invented by hackers. • Most recent coordinated attacks on many ebusiness websites in 2000. • Hackers tend to use new types of intrusions that intrusion detection system is unaware of or weak at detecting them successfully. • Data mining for intrusion detection is a very data-intensive process. • very large data • revolving patterns • real-time detection
Question • When new types of intrusions are invented, can we quickly adapt our existing model to be able to detect these new intrusions before they cause more damages? • If we don't have a solution, the new attack will make significant damage. • For this kind of problem, having a solution that is not completely satisfactory is better than having no solution.
Naive Approach - Complete Re-training Existing Training Data Merged Training Data New Data NEW Intrusion Detection Model Inductive Learner
Problem with the Naive Approach • Since data (existing plus new) will be very large, it takes a long time to compute a detection model. • By the time, the model is constructed, the new attack probably will have already made enough damage to our system.
New Approach New Data NEW Model Learner Combined Model Existing Model Key point: we only compute model from the data on new types of intrusions only
How do we label connections? a new connection existing model connection type unrecognized NEW Model normal or previously known intrusion types normal or new intrusion types
Basic Idea • Existing model is built to identify THREE classes • normal • some type of intrusions • and anomaly: some connection that is neither normal nor some known types of intrusions. • anomaly detection - we use the artificial anomaly generation method (Fan et al, ICDM 2001)
Anomaly Detection • Generate "artificial anomalies" from training data: similar to "near misses". • Artificial anomalies are data points that are different from the training data. • The algorithm concentrates on feature values that are infrequent in the training data. • Distribution-based Artificial Anomaly (Fan et al, ICDM2001)
Four Configurations • H1(x): existing model. • H2(x): new model. • They differ in how H2(x) is computed. • and how H1(x) and H2(x) are combined • and how a connection is processed and classified.
Experiment • 1998 DARPA Intrusion Detection Evaluation Dataset • 22 different types of intrusions.
Experiment • Sequence to introduce intrusions into the training data to simulate new intrusions are being invented and launched by hackers • 22! unique sequences • we randomly used 3 unique sequences. • The results are averaged. • RIPPER • unordered rulesets
Measurements • All results on the new intrusion types • Precision: • If I catch a potential thief, what is the probability that it is a real thief? • Recall: • What is the probability that real thieves are detected? • Anomaly Detection Rate classified as anomaly • Other classified as other types of intrusions.
Summary of results • The most accurate is Configuration 1 where • new model is trained from normal and the new intrusion type • all predicted normal and anomalies by the old model is examined by the new model. • Reason: • Existing model's precision to detect normal connection influences combined model's accuracy. • New data is limited in amount. Artificial anomalies generated from new data is limited as well.
Related Work (incomplete list) • Anomaly Detection: • SRI's IDES use probability distribution of past activities to measure abnormality of host events. We measure network events. • Forrest et al uses absence of subsequence to measure abnormality. • Lane and Brodley employ a similar approach but use incremental learning approach to update stored sequence from UNIX shell commands. • Ghosh and Schwarzbard use neural network to learn profile of normality and distance function to detect abnormality. • Generating Artificial Data: • Nigam et al assign label to unlabelled data using classifier trained from labeled data. • Chang and Lippman applied voice transformation techniques to add artificial training talkers to increase variability. • Multiple classifiers: • Asker and Macline "Ensembles as a sequence of classifiers"
Summary and Future Work • Proposed a two-step two classifier approach for efficient training and fast model deployment. • Empirically tested in the intrusion detection domain. • Need to test if it works well for other domains.