280 likes | 419 Views
Learning to Detect Computer Intrusions with (Extremely) Few False Alarms. Jude Shavlik. Mark Shavlik. Two Basic Approaches for Intrusion Detection Systems (IDS). Pattern Matching If packet contains “site exec” and … then sound alarm Famous example: SNORT.org
E N D
Learning to Detect Computer Intrusions with (Extremely) Few False Alarms Jude Shavlik Mark Shavlik
Two Basic Approaches for Intrusion Detection Systems (IDS) • Pattern Matching • Ifpacket contains “site exec”and…then sound alarm • Famous example: SNORT.org • Weakness: Don’t (yet) have patterns for new attacks • Anomaly Detection • Usually based on statistics measured during normal behavior • Weakness: does anomaly = intrusion ? • Both approaches often suffer from too many false alarms • Admin’s ignore IDS when flooded by false alarms
How to Get Training Examplesfor Machine Learning? • Ideally, get measurements during • Normal Operation vs. • Intrusions • However, hard to define space of possible intrusions • Instead, learn from “positive examples only” • Learn what’s normal and define all else as anomalous
Behavior-Based Intrusion Detection • Need to go beyond looking solely at external network traffic and log files • File-access patterns • Typing behavior • Choice of programs run • … • Like human immune system, continually monitor and notice “foreign” behavior
Our General Approach • Identify ≈unique characteristics of each user/server’s behavior • Every second, measure 100’s of Windows 2000 properties • in/out network traffic, programs running,keys pressed, kernel usage, etc • Predict Prob( normal | measurements ) • Raise alarm if recent measurementsseem unlikely for this user/server
Goal: Choose “Feature Space” that Widely Separates User from General PopulationChoose separate set of “features” for each user Specific User General Population Possible Measurements in Chosen Space
What We’re Measuring (in Windows 2000) • Performance Monitor (Perfmon) data • File bytes written per second • TCP/IP/UDP/ICMP segments sent per second • System calls per second • # of processes, threads, events, … • Event-Log entries • Programs running, CPU usage, working-set size • MS Office, Wordpad, Notepad • Browsers: IE, Netscape • Program development tools, … • Keystroke and mouse events
Temporal Aggregates • Actual Value Measured • Average of the Previous 10 Values • Average of the Previous 100 Values • Difference between Current Value and Previous Value • Difference between Current Value and Average of Last 10 • Difference between Current Value and Ave of Last 100 • Difference between Averages of Prev 10 and Prev 100
Using (Naïve) Bayesian Networks • Learning network structure too CPU-intensive • Plus, naïve Bayes frequently works best • Testset results • 59.2% of intrusions detected • About 2 false alarms per day per user • This paper’s approach • 93.6% detected • 0.3 false alarms per day per user
Our Intrusion-Detection Template Last W (window width) measurements ... X X X time (in sec) If score(current measurements) > T then raise “mini” alarm If # “mini” alarms in window > N then predict intrusion Use tuning set to choose per user good values for T and N
Replay of User X’s Behavior Methodology – for Training and Evaluating Learned Models . . . Alarm from Model of User X ? yes Alarm from Model of User Y ? False Alarm yes “Intrusion” Detected
Learning to Score Windows 2000 Measurements (done for each user) • Initialize weights on each feature to 1 • For each training example do • Set weightedVotesFOR = 0 & weightedVotesAGAINST = 0 • If measurement iis “unlikely” (ie, low prob) then add weightitoweightedVotesFORelseadd weightitoweightedVotesAGAINST • If weightedVotesFOR >weightedVotesAGAINSTthen raise “mini alarm” • If decision about intrusion incorrect, multiply weights by ½ on all measurements that voted incorrectly (Winnow algorithm)
Choosing Good Parameter Values • For each user • Use training data to estimate probabilities and weight individual measurements • Try 20 values for T and 20 values for N • For each T x N pairing compute detection and false-alarm rates on tuning set • Select T x N pairing whose • false-alarm rate is less than spec (e.g., 1 per day) • has highest detection rate
Experimental Data • Subjects • Insiders: 10 employees at Shavlik Technologies • Outsiders: 6 additional Shavlik employees • Unobtrusivelycollected data for 6 weeks • 7 GBytes archived • Task: Are current measurements from user X?
Training, Tuning, and Testing Sets • Very important in machine learning to not usetestingdata to optimize parameters! • Can tune to zero false alarms and high detection rates! • Train Set: first two weeks of data • Build a (statistical) model • Tune Set:middle two weeks of data • Choose good parameter settings • Test Set:last two weeks of data • Evaluate “frozen” model
Highly Weighted Measurements(% of time in Top Ten across users & experiments) • Number of Semaphores (43%) • Logon Total (43%) • Print Jobs (41%) • System Driver Total Bytes (39%) • CMD: Handle Count (35%) • Excel: Handle Count (26%) • Number of Mutexes (25%) • Errors Access Permissions (24%) • Files Opened Total (23%) • TCP Connections Passive (23%) • Notepad: % Processor Time (21%) 73 measurements occur > 10%
Confusion Matrix – Detection Rates(3 of 10 Subjects Shown) INTRUDER A B C 100% 100% A B C 25% 100% OWNER 91% 94%
Some Questions • What if user behavior changes?(Called concept drift in machine learning) • One solution Assign “half life” to counts used to compute prob’s Multiply counts by f < 1 each day (10/20 vs. 1000/2000) • CPU and memory demands too large? • Measuring features and updating counts < 1% CPU • Tuning of parameters needs to be done off-line • How often to check for intrusions? • Only check when window full, then clear window • Else too many false alarms
Future Directions • Measure system while applying variousknown intrusion techniques • Compare to measurements during normal operation • Train on known methods 1, …, N -1 • Test using data from known method N • Analyze simultaneous measurements from network of computers • Analyze impact of intruder’s behavior changing recorded statistics • Current results: prob of detecting intruder in firstW sec
Some Related Work on Anomaly Detection • Machine learning for intrusion detection • Lane & Brodley (1998) • Gosh et al. (1999) • Lee et al. (1999) • Warrender et al. (1999) • Agarwal & Joshi (2001) • Typically Unix-based • Streams of programs invoked or network traffic analyzed • Analysis of keystroke dynamics • Monrose & Rubin (1997) • For authenticating passwords
Conclusions • Can accurately characterize individualuser behavior using simple modelsbased on measuring many system properties • Such “profiles” can provide protection without too many false alarms • Separate data into train, tune, and test sets • “Let the data decide” good parameter settings,on per-user basis (including measurements to use)
Acknowledgements • DARPA’s Insider Threat Active Profiling (ITAP) program within ATIAS program • Mike Fahland for help with data collection • Shavlik, Inc employees who allowed collection of their usage data
Using Relative Probabilities Alarm: Prob( keystrokes | machine owner ) Prob( keystrokes | population )
Value of Relative Probabilities • Using relativeprobabilities • Separates rare for this userfrom rare for everyone • Example of variance reduction • Reduce variance in a measurement by comparing to another (eg, paired t-tests)
Tradeoff between False Alarmsand Detected Intrusions (ROC Curve) spec Note: left-most value results from ZEROtune-set false alarms
Conclusions • Can accurately characterize individualuser behavior using simple modelsbased on measuring many system properties • Such “profiles” can provide protection without too many false alarms • Separate data into train, tune, and test sets • “Let the data decide” good parameter settings,on per-user basis (including measurements to use) • Normalize prob’s by general-population prob’s • Separaterare for this user (or server)fromrare for everyone
Outline • Approaches for BuildingIntrusion-Detection Systems • A Bit More on What We Measure • Experiments with Windows 2000 Data • Wrapup