1 / 30

North Carolina State University Columbia University Florida Institute of Technology

A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection Models PI Meeting - July, 2000. North Carolina State University Columbia University Florida Institute of Technology. Overview. Project description Progress report: correlation cost-sensitive modeling

jadon
Download Presentation

North Carolina State University Columbia University Florida Institute of Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Data Mining Approach for Building Cost-Sensitive and Light Intrusion Detection ModelsPI Meeting - July, 2000 North Carolina State University Columbia University Florida Institute of Technology

  2. Overview • Project description • Progress report: • correlation • cost-sensitive modeling • anomaly detection • collaboration with industry • Plan of work for 2000-2001

  3. New Ideas/Hypotheses • High-volume automated attacks can overwhelm an IDS and its staff. • Use cost-sensitive data mining algorithms to construct ID models that consider cost factors: • damage cost, response cost, operational cost, etc. • Multiple specialized and light ID models can be dynamically activated/configured in run-time • Cost-effectiveness as the guiding principle and multi-model correlation as the architectural approach .

  4. Impact • A better understanding of the cost factors, cost models, and cost metrics related to intrusion detection. • Modeling techniques and deployment strategies for cost-effective IDSs. • “Clustering” techniques for grouping intrusions and building specialized and light models. • An architecture for dynamically activating, configuring, and correlating ID models.

  5. Correlation: Model and Issues • “Good” base models: data sources and modeling techniques. • The combined model: the correlation algorithms and network topology. across sources across time/sources

  6. Correlation: Approaches • Extend previous work in JAM • A sequence of time-stamped records • each is composed of signals from multiple sensors (network topology information embedded); • Apply data mining techniques to learn how to correlate the signals to generate a combined sensor: • link analysis, sequence analysis, machine learning (classification), etc.

  7. Correlation: Integrating NM and ID Signals • A stream of measures (anomaly reports) on MIB variables of network elements and a stream of ID signals: • Better coverage; • Early sensing of attacks. • Normal measures of network traffics and parameter values of ID signatures • S = f(N, A), A is invariant then S=g(N). • Automatic parameter adjustment, S1=g(N1).

  8. Cost Factors of IDSs • Attack taxonomy: result/target/technique • Development cost • Damage cost (DCost) • The amount of damage when ID is not available or ineffective. • Response cost (RCost) • The cost of acting upon an alarm of potential intrusion. • Operational cost (OpCost) • The cost of processing and analyzing audit data ; • Mainly the computational costs of the features.

  9. Cost Models of IDSs • The total cost of an IDS over a set of events: • CumulativeCost(E) = eE (CCost(e) + OpCost(e)) • CCost(e), the consequential cost, depends on prediction on event e

  10. Consequential Cost (CCost) • For event e :

  11. Cost-sensitive Modeling: Objectives • Reducing operational costs: • Use cheap features in ID models. • Reducing consequential costs: • Do not respond to an intrusion if RCost > DCost.

  12. Cost-sensitive Modeling: Approaches • Reducing operational costs: • A multiple-model approach: • Build multiple rule-sets, each with features of different cost levels; • Use cheaper rule-sets first, costlier ones later only for required accuracy. • Feature-Cost-Sensitive Rule Induction: • Search heuristic considers information gain AND feature cost.

  13. Cost-sensitive Modeling: Approaches (continued) • Reducing consequential costs: • MetaCost: • Purposely re-label intrusions with Rcost > DCost as normal. • Post-Detection decision: • Action depends on comparison of RCost and DCost.

  14. Latest Results • OpCost • Compare the multiple-model approach with single-model approach; • rdc%: (single - multiple)/single; • range: 57% to 79%.

  15. Latest Results (continued) • CCost using a post-detection cost-sensitive decision module • rdc% range: 75% to 95%; • Compared with single model: slightly better rdc%; • Compared with cost-insensitive models: 25% higher rdc%.

  16. Anomaly Detection • Unsupervised Training Methods • Build models over noisy (not clean) data • Artificial Anomalies • Improves performance of anomaly detection methods. • Combining misuse and anomaly detection.

  17. AD over Noisy Data • Builds normal models over data containing some anomalies. • Motivating Assumptions: • Intrusions are extremely rare compared to to normal. • Intrusions are quantitatively different.

  18. Approach Overview • Mixture Model • Normal Component • Anomalous Component • Build Probabilistic Model of Data • Max Likelihood test for detection.

  19. Mixture Model of Anomalies • Assume a generative model: The data is generated with a probability distribution D. • Each element originates from one of two components. • M, the Majority Distribution (x  M). • A, the Anomalous Distribution (x  A). • Thus: D = (1-)M + A

  20. Modeling Probability Distributions • Train Probability Distributions over current sets of M and A. • PM(X) = probability distribution for Majority • PA(X) = probability distribution for Anomaly • Any probability modeling method can be used: Naïve Bayes, Max Entropy, etc.

  21. Detecting Anomalies • Likelihood of a partition of the set of all elements D into M and A: L(D)=  PD(X) =((1-)|M| PM(X) )( |A| PA(X)) • Log Likelihood (for computational reasons): LL(D)=log(L(D)) D M A

  22. Algorithm for Detection • Assume all elements are normal (M0=D, A0= ). • Compute PD(X). • Using PD(X) compute LL(D). • For each element compute difference in LL(D) if removed from M and inserted into A. • If the difference is large enough, then declare the element an anomaly.

  23. Evaluating xt Mt+1 = Mt – {xt} At+1 = At U {xt} Recompute PMt and PAt. (efficiently) If (LLt+1-LLt)> threshold, xt is anomaly Otherwise xt is normal

  24. Experiments • Two Sets of experiments: • Measured Performance against comparison methods over noisy data. • Measured Performance trained over noisy data against comparison methods trained over clean data.

  25. AD Using Artificial Anomalies • Generate abnormal behavior artificially • assume the given normal data are representative • “near misses” of normal behavior is considered abnormal • change the value of only one feature in an instance of normal behavior • sparsely represented values are sampled more frequently • “near misses” help define a tight boundary enclosing the normal behavior

  26. Experimental Results • Learning algorithm: RIPPER rule learner. • Data: 1998/99 DARPA evaluation • U2R, R2L, DOS, PRB: 22 “clusters” • Training data: normal and artificial anomalies • Results • Overall hit rate: 94.26% (correctly normal or intrusion) • Overall false alarm rate: 2.02% • 100% dectection: buffer_overflow, guess_passwd, phf, back • 0% detection: perl, spy, teardrop, ipsweep, nmap • 50+% detection: 13 out of 22 intrusion subclasses

  27. Combining Anomaly And Misuse Detection • Training data: normal, artificially generated anomalies, known intrusions • The learned model can predict normal, anomaly, or known intrusion subclass • Experiments were performed on increasingsubsets of known intrusion subclasses in the training data (simulates identified intrusions over time).

  28. Combining Anomaly And Misuse Detection (continued) • Consider phf, pod, teardrop, spy, and smurf are unknown (absent from the training data) • Anomaly detection rate: phf=25%, pod=100%, teardrop=93.91%, spy=50%, smurf=100% • Overall false alarm rate: .20% • The false alarm rate has dropped from 2.02% to .20% when some known attacks are included for training

  29. Collaboration with Industry • RST Inc. • Anomaly detection on NT systems • NFR Inc. • real-time IDS • SAS Institute • off-line ID (funded by SAS) • Aprisma (Cabletron) • Integrating ID with NM (funded by Aprisma) • HRL Labs • ID in wireless networks (funded by HRL)

  30. Plan for 2000-2001 • Dynamic cost-sensitive modeling and deployment • work with industry for realistic cost analysis and real-time testing • Anomaly detection • improve existing algorithms using feedback from evaluation • Correlation • develop/evaluate algorithms for integrating multiple sources data/evidences

More Related