230 likes | 360 Views
Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004. User 1. User 2. User 3. User 4. ls. ls. gcc. netscape. from. cd. a.out. netscape. cd. ls.
E N D
Activity monitoring:Anomaly detection as on-line classificationTom FawcettHP Laboratories1501 Page Mill Rd.Palo Alto, CASymposium on Machine Learning for Anomaly DetectionMay 22-23, 2004
User 1 User 2 User 3 User 4 ls ls gcc netscape from cd a.out netscape cd ls vi netscape latex from gcc acroread gv emacs a.out netscape emacs logout vi . su latex gdb . ypcat xfig cd . su docalc gcc su docalcs pwd rlogin from ls finger pwd from . . pwd intrusion . . . . . . Example: Intrusion detection
Example: Monitoring digital switch health S1 S2 S3 ... Abnormal behavior culminating in hard failure Si
Example: Monitoring business news • 1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative… • 2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […] • 3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […] • 4 June 4 (PR Newswire) --- Amazon.com, facing threats of legal action from The New York Times, has asked the U.S. District Court in Seattle to allow Amazon.com to continue advertising. • 5 June 5, 1999 --- Motorola today announced that its MPC923, MPC950 and MPC960 PowerPC processors have been officially certified by Microsoft Corporation to support the… • 6 June 8, 1999 (PR Newswire) --- WebTV Networks, Inc. and EchoStar Communications Corp. at CES today announced the Microsoft WebTV Network Plus service for satellite and the EchoStar…
Monitoring business news — VISX • 1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative… • 2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […] • 3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […] VISX stock price
Commonalities of the domains • Temporal: data comprise time series • Large number of entities (users, companies, accounts, devices) • Large volumes of data (commands, news stories, calls) on entity activity • General goal is to alert on interesting, rare events (intrusions, fraud, unusual business activity) Onset of significant activity Detection goals: • Identify as many interesting events as possible • Alert as soon as possible • Minimize false alarms
Topic detection and tracking Allan, Papka & Lavrenko (SIGIR-98) Crabtree & Soltysiak (IJCAI-97) Allen (ed.), 2002 Fraud detection Chan & Stolfo (KDD-98) Cox et al. (DMKD-97) Fawcett & Provost (KDD-97) Burge & Shawe-Taylor (FDRM-97) Ezawa & Norton (KDD-95) News/event alerts Yang, Pierce & Carbonell Fawcett & Provost (KDD-99) Epidemic/bio-terrorism detectionWong et al. 2002, 2003Shmueli 2004 Activity monitoring problems • Intrusion detection Lee, Stolfo, Mok (KDD-99) Lane & Brodley (KDD-98) Ryan et al. (FDRM-97) DuMouchel & Schonlau (KDD-98) • Network alarm monitoring Sasisekharan et al. (KDD-94) Weiss & Hirsh (KDD-98) Klemettinen 99 Hardware fault detection Dasgupta & Forrest 96 Smyth 92
Event stream ..................................... Window vector extraction Class Instance vectors - - - Classification problem - ... .. + + • Many approaches use |w|=1 Standard supervised learning approach Onset of significant activity
Intrusion Login sessions: userintruder Challenges for machine learning approaches • Very skewed class distributions – inherent asymmetry • Differing error costs • Imprecision in class and cost distributions • Temporal dependencies among alarms Earlier is better than later Several is (usually) no better than one • Solutions may use different representations Different timescales, different granularity |w| = 1 command|w| = 1 login session |w| = 1 process life
Normal activity Positive activity d d d d d d d d d d d d d d d d d d d d d d d Di (H is alarm history; see paper) Formalism • D: set of data streams being monitored • Di = < d1, d2, d3, ..., dn>: sequence of data items in stream Di • : alarm time • : onset of positive activity Each episode has at most one Benefit/cost of alarms: s(, a, H, Di): benefit of if true positivef(, H, Di): cost of if false positive
(H is alarm history; see paper) Example: Plot of s(, O, Di) as a function of alarm time smax s 0 Formalism • D : set of data streams being monitored • Di = < d1, d2, d3, ..., dn>: sequence of data items in stream i • : alarm time • : onset of positive activity Benefit/cost of alarms: s(, , H, Di): benefit of if true positivef(, H, Di): cost of if false positive
smax s 0 Detecting digital switch failures Minimum advance notice Hard failure point Onset of observable switch abnormalities
How is this framework better? More realistic evaluation of solution methods • Differing error costs • Skewed class distributions AMOC analysis • Temporal dependencies among alarms: • Earlier is better than later • Several is no better than one • Solutions may use different representations • Different timescales, granularities • Time and alarm history in s and f • AMOC normalizes WRT time • (no definite notion of false positive max)
AMOC curves Random alarms with different frequencies (.1/hr, .2/hr, etc. 1 if 0 α-τ 50 otherwise s(τ,α) = f = 1 ROC curves vs AMOC curves
Activity monitoring: Solution approaches • Fundamental problem characteristics: • Asymmetry of classes: Positive activity is inherently rare • Discriminating method: differentiates positive and normal activity • vs • Profiling method: models normal activity without reference to positive.(ie, learning from negative examples only) • Multi-level representation of data • Uniform modeling: Models activity uniformly across all monitored entities • vs • Individual modeling: Models Di activity individually
Example: Monitoring business news • Goal: Scan news stories associated with businesses, alarm on stories that correlate with “interesting” behavior. • Interesting = 10% change in stock price (up or down) within 34.5 hours • Data: Yahoo stories and stock prices from 6000 companies over 3 months • DC-1 system • Developed for cell phone fraud detection • Performs discriminating, individual modeling DIntel 2 1
Example: Monitoring business news Textual indicators for price spikes: said [it] expects same period revenues increase over per sharefourth compare[d] income quarter fiscalearnings per diluted fiscal quarter ended expenses months endedtoday reported consensus quarter earnings year ended repurchaselower than shortfall Q[1234] fourth-quarter first callbelow analyst for quarter research [and] development AMOC curve 1 if 0 α-τ 34.5 hours0 otherwise s(τ,α) = f = 1
Pitfalls in evaluation Why performance may look better than it should • Evaluating too locally • Windows shouldn’t overlap • Behavior may be episodic or local (“bull market behavior”) • Need out-of-time sampling … Di Train Test
Pitfalls in evaluation • Mixing events from a single account between train and test sets • Goal of evaluation is to determine how well system will work on new, unseen accounts. • Events within an account may be much more similar to each other than to events in other accounts • Mixing one account’s examples between train and test sets may leak test info into training • Need out-of-account sampling Train … Train Test Test …
Conclusions • This form of anomaly detection is inherently classification • Alarms True positives, false positives, etc. • Classification methods can be brought to bear • But temporal aspects make standard classification metrics inappropriate • Activity monitoring domains are common in machine learning. Solution methods & strategies can be shared and adapted.
Activity monitoring: Learning methods … D1 … D2 … D3 … D4 … D5 ... Problem characteristics Class asymmetry Discriminating methodvsProfiling method Multi-level representation Uniform modelingvsIndividual modeling
d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d Transforming tau — Circuit failure Hard failure(end of episode) Beginning of positive visible activity Degradation Implicit lookahead interval