1 / 28

Learning to Detect Computer Intrusions with (Extremely) Few False Alarms

Learning to Detect Computer Intrusions with (Extremely) Few False Alarms. Jude Shavlik. Mark Shavlik. Two Basic Approaches for Intrusion Detection Systems (IDS). Pattern Matching If packet contains “site exec” and … then sound alarm Famous example: SNORT.org

jerry
Download Presentation

Learning to Detect Computer Intrusions with (Extremely) Few False Alarms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Detect Computer Intrusions with (Extremely) Few False Alarms Jude Shavlik Mark Shavlik

  2. Two Basic Approaches for Intrusion Detection Systems (IDS) • Pattern Matching • Ifpacket contains “site exec”and…then sound alarm • Famous example: SNORT.org • Weakness: Don’t (yet) have patterns for new attacks • Anomaly Detection  • Usually based on statistics measured during normal behavior • Weakness: does anomaly = intrusion ? • Both approaches often suffer from too many false alarms • Admin’s ignore IDS when flooded by false alarms

  3. How to Get Training Examplesfor Machine Learning? • Ideally, get measurements during • Normal Operation vs. • Intrusions • However, hard to define space of possible intrusions • Instead, learn from “positive examples only” • Learn what’s normal and define all else as anomalous

  4. Behavior-Based Intrusion Detection • Need to go beyond looking solely at external network traffic and log files • File-access patterns • Typing behavior • Choice of programs run • … • Like human immune system, continually monitor and notice “foreign” behavior

  5. Our General Approach • Identify ≈unique characteristics of each user/server’s behavior • Every second, measure 100’s of Windows 2000 properties • in/out network traffic, programs running,keys pressed, kernel usage, etc • Predict Prob( normal | measurements ) • Raise alarm if recent measurementsseem unlikely for this user/server

  6. Goal: Choose “Feature Space” that Widely Separates User from General PopulationChoose separate set of “features” for each user Specific User General Population Possible Measurements in Chosen Space

  7. What We’re Measuring (in Windows 2000) • Performance Monitor (Perfmon) data • File bytes written per second • TCP/IP/UDP/ICMP segments sent per second • System calls per second • # of processes, threads, events, … • Event-Log entries • Programs running, CPU usage, working-set size • MS Office, Wordpad, Notepad • Browsers: IE, Netscape • Program development tools, … • Keystroke and mouse events

  8. Temporal Aggregates • Actual Value Measured • Average of the Previous 10 Values • Average of the Previous 100 Values • Difference between Current Value and Previous Value • Difference between Current Value and Average of Last 10 • Difference between Current Value and Ave of Last 100 • Difference between Averages of Prev 10 and Prev 100

  9. Using (Naïve) Bayesian Networks • Learning network structure too CPU-intensive • Plus, naïve Bayes frequently works best • Testset results • 59.2% of intrusions detected • About 2 false alarms per day per user • This paper’s approach • 93.6% detected • 0.3 false alarms per day per user

  10. Our Intrusion-Detection Template Last W (window width) measurements ... X X X time (in sec) If score(current measurements) > T then raise “mini” alarm If # “mini” alarms in window > N then predict intrusion Use tuning set to choose per user good values for T and N

  11. Replay of User X’s Behavior Methodology – for Training and Evaluating Learned Models . . . Alarm from Model of User X ? yes Alarm from Model of User Y ? False Alarm yes “Intrusion” Detected

  12. Learning to Score Windows 2000 Measurements (done for each user) • Initialize weights on each feature to 1 • For each training example do • Set weightedVotesFOR = 0 & weightedVotesAGAINST = 0 • If measurement iis “unlikely” (ie, low prob) then add weightitoweightedVotesFORelseadd weightitoweightedVotesAGAINST • If weightedVotesFOR >weightedVotesAGAINSTthen raise “mini alarm” • If decision about intrusion incorrect, multiply weights by ½ on all measurements that voted incorrectly (Winnow algorithm)

  13. Choosing Good Parameter Values • For each user • Use training data to estimate probabilities and weight individual measurements • Try 20 values for T and 20 values for N • For each T x N pairing compute detection and false-alarm rates on tuning set • Select T x N pairing whose • false-alarm rate is less than spec (e.g., 1 per day) • has highest detection rate

  14. Experimental Data • Subjects • Insiders: 10 employees at Shavlik Technologies • Outsiders: 6 additional Shavlik employees • Unobtrusivelycollected data for 6 weeks • 7 GBytes archived • Task: Are current measurements from user X?

  15. Training, Tuning, and Testing Sets • Very important in machine learning to not usetestingdata to optimize parameters! • Can tune to zero false alarms and high detection rates! • Train Set: first two weeks of data • Build a (statistical) model • Tune Set:middle two weeks of data • Choose good parameter settings • Test Set:last two weeks of data • Evaluate “frozen” model

  16. Experimental Results on the Testset

  17. Highly Weighted Measurements(% of time in Top Ten across users & experiments) • Number of Semaphores (43%) • Logon Total (43%) • Print Jobs (41%) • System Driver Total Bytes (39%) • CMD: Handle Count (35%) • Excel: Handle Count (26%) • Number of Mutexes (25%) • Errors Access Permissions (24%) • Files Opened Total (23%) • TCP Connections Passive (23%) • Notepad: % Processor Time (21%) 73 measurements occur > 10%

  18. Confusion Matrix – Detection Rates(3 of 10 Subjects Shown) INTRUDER A B C 100% 100% A B C 25% 100% OWNER 91% 94%

  19. Some Questions • What if user behavior changes?(Called concept drift in machine learning) • One solution Assign “half life” to counts used to compute prob’s Multiply counts by f < 1 each day (10/20 vs. 1000/2000) • CPU and memory demands too large? • Measuring features and updating counts < 1% CPU • Tuning of parameters needs to be done off-line • How often to check for intrusions? • Only check when window full, then clear window • Else too many false alarms

  20. Future Directions • Measure system while applying variousknown intrusion techniques • Compare to measurements during normal operation • Train on known methods 1, …, N -1 • Test using data from known method N • Analyze simultaneous measurements from network of computers • Analyze impact of intruder’s behavior changing recorded statistics • Current results: prob of detecting intruder in firstW sec

  21. Some Related Work on Anomaly Detection • Machine learning for intrusion detection • Lane & Brodley (1998) • Gosh et al. (1999) • Lee et al. (1999) • Warrender et al. (1999) • Agarwal & Joshi (2001) • Typically Unix-based • Streams of programs invoked or network traffic analyzed • Analysis of keystroke dynamics • Monrose & Rubin (1997) • For authenticating passwords

  22. Conclusions • Can accurately characterize individualuser behavior using simple modelsbased on measuring many system properties • Such “profiles” can provide protection without too many false alarms • Separate data into train, tune, and test sets • “Let the data decide” good parameter settings,on per-user basis (including measurements to use)

  23. Acknowledgements • DARPA’s Insider Threat Active Profiling (ITAP) program within ATIAS program • Mike Fahland for help with data collection • Shavlik, Inc employees who allowed collection of their usage data

  24. Using Relative Probabilities Alarm: Prob( keystrokes | machine owner ) Prob( keystrokes | population )

  25. Value of Relative Probabilities • Using relativeprobabilities • Separates rare for this userfrom rare for everyone • Example of variance reduction • Reduce variance in a measurement by comparing to another (eg, paired t-tests)

  26. Tradeoff between False Alarmsand Detected Intrusions (ROC Curve) spec Note: left-most value results from ZEROtune-set false alarms

  27. Conclusions • Can accurately characterize individualuser behavior using simple modelsbased on measuring many system properties • Such “profiles” can provide protection without too many false alarms • Separate data into train, tune, and test sets • “Let the data decide” good parameter settings,on per-user basis (including measurements to use) • Normalize prob’s by general-population prob’s • Separaterare for this user (or server)fromrare for everyone

  28. Outline • Approaches for BuildingIntrusion-Detection Systems • A Bit More on What We Measure • Experiments with Windows 2000 Data • Wrapup

More Related